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SCREENI NG FOR NOVEL COMPO UNDS WHICH REG ULATE 
BIOLOGICAL INTERACTIONS 

5 Field of the Invention 

The present invention relates to a method for the discoveiy of nevv bio-active 
molecules, such as antibiotics, anti-virals, anti-tumor agents and regulatory proteins. 
More particularly, the invention relates to a method for screening for the ability of these 
molecules to affect the interactions of other proteins or of other molecules utilizing a 
1 0 method for detecting the interaction of proteins or other molecules in in- vivo or in-vitro 
systems. The invention further relates to a system for capturing genes potentially 
encoding novel biochemical pathways of interest in prokaryotic or eukar^'otic systems, 
and screening these pathways for compounds of interest utilizing the methods presented 
herein. 

15 Background of the Invention 

Within the last decade there has been a dramatic increase in the need for bioactive 
compounds with novel activities. This demand has arisen largely ftom changes in 
worldwide demographics coupled with the clear and increasing trend in the number of 
-pathogenic organisms that are resistant.to currently available antibiotics. For exaniple, 
20 while there has been a surge in demand for antibacterial drugs in emerging nations with 
young populations, countries with aging populations, such as the US, require a growing 
repertoire of drugs against cancer, diabetes, arthritis and other debilitating conditions. 
The death rate from infectious diseases has increased 58% between 1 980 and 1992 (1) 
and it has been estimated that the emergence of antibiotic i-esistant microbes has added 
25 in excess of $30 billion annually to the cost of health care in the US alone (2). As a 
response to this trend pharmaceutical companies have significantly increased their 
screening of microbial diversity for compounds with unique activities or specificities. 
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There are several common sources of lead compounds (drug candidates), 
including natural product collections, synthetic chemical-collections, and synthetic 
combinatorial chemical libraries, such as, nucleotides, peptides; or other polymeric 
molecules. Each of :these sources has advantages and- disadvantages. The success of 
5 programs to screen these candidates depends largely on . the number' of compounds 
entering the programs, and pharmaceuticaf companies have to date screened hundred of 
thousarids of. s>'nthetic and natural compounds in search of lead compounds. 
.Unfortunately, .the ratio of novel compounds to previously-discovered compoiinds has 
diminished with time. The discover}- rate of novel lead compounds has not kept jjace 
10 vvithdemanddespite.the best efforts of pharmaceutical companies. There exists a strong : 
need for accessing . new sources of potential drug candidates. 

The majority -of bioactive compounds currently, in - use- are derived from soilv 
■'• microorganisms. Many microbes inhabiting soils and other complex ecological 
, communities produce a vanet}' of compounds that increase their ability' to survive and . 
15 , proliferate. Tliese compounds are generally thought to be nonessential for growth of the 
organism and are s^Tithe sized with the .aid of genes involved in mtermediary metabolism 
: ' .hence their. name - secondary metabolites. Secondary metabolites that influence the,. 

• growth or survival of other organisms are known as.bioactive compounds and ser\'e as 
; . ' key components„of the chemical defense arsenal of both micro- and macroorganisms. 
20 Humans have, exploited these compounds for use as antibiotics, antiinfectives and' other 
bioactive compounds with activity.against a broad range of prokaryotic and eukaryotic 
' pathogens: Approximately 6,000 ^bioactive. compounds of microbial origin have been 
^characterized, with more than 60% produced by the gram positive soil bacteria of the 
■genus Streptomyces. Q). Qf these, at least 7Q are cunently: used for bi'omedical and 
25 agricultural applications. The largest class of bioactive compounds, the polyketides, 
include a broad range of antibiotics, immunosuppresents and anticancer .agents which 
together account for sales of over S5 billion per year. 

Despite the seemingly large number of available bioactive compounds, it is clear 
that one of the greatest challenges facing modem biomedical science is the proliferation 
30 of antibiotic resistant pathogens. Because of their short generation time and ability to 
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readily exchange genetic information, pathogenic microbes have rapidly evolved, and 
disseminated resistance mechanisms against virtually all classes of antibiotic compounds. 
For example, there are virulent strains of the human pathogens Staphylococcus, and 
~ Streptococcus that can- now~be treated^ith~but~a:singlr-antibioticrvancomycinrand^ 

5 resistance, to this compound will require only the transfer of a single gene, vanA, from 
resistant Enterococcus species for this to occur (4). When this crucial need for hovel 
antibacterial compounds is superimposed on the growing demand for enzyme inhibitors, 
• immunosuppresants and anti-cancer agents it becomes" readily apparent why 
pharmaceutical companies have stepped up their screening of microbial diversity for 

1 0 bioactive compounds with novel properties. 

The approach currently used to screen microbes for new bioactive compourids has 
been largely unchanged since the inception of the field. New isolates of bacteria, 
particularly gram positi% e . strains from soii environments, are coiiected and their ff, 
metabolites tested for pharmacological activit>\ A more recent approach has been to use 

15 recombinant techniques to synthesize hybrid antibiotic pathways by combining gene ^ 
subunits from previously characterized pathways. Tnis approach, called combinatorial ^ 
biosynthesis has focused primarily on the polyketide antibiotics and has resulted in a 
number of structurally unique compounds which have displayed activity (5, 6). 
However, compounds with novel antibiotic activities have not yei been reported; an 

20 observation that may be due to the fact that the pathway subunits are derived from those 
genes encoding previously characterized compounds. Dramatic success in using 
recombinant approaches to small molecule synthesis has been recently reported in the 
engineering of biosynthetic pathways to increase the production of desirable antibiotics 
(7,8).- 

25 There is still tremendous biodiversity that remains untapped as the source of lead 

compounds. However, the currently available methods for screening and producing lead 
compounds cannot be applied efficiently to these under-explored resources. For instance, 
it is estimated that at least 99% of marine bacteria, species do not survive on laboratory 
media, and commercially available fermentation equipment is not optimal for use in the 

30 conditions under which these species will grow, hence these organisms are difficult or 
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impossible to culture for .screening or re-supply. Recollection, groN^lh, strain 
• improvement, media, improvement. -and scale-up production , of the drug-producing 
organisms often pose problems for syntl'.esi.s and development of lead compounds... 
Furthermore, the need for the interaction of specific organisms to synthesize some 
5 compounds makes their use in discovery extremely difficuh. New methods to harness 
' kegeneticresources and chemical diversity of compounds for 

use in drug discovery are very valuable. The present invention prpvides a path to access . 
• this -untapped biodiversity and, to rapidly screen for activities of interest utilizing 

■ recombinantPNA technology^ This invention combines the benefits associated with the 
0 ability to rapidly screen nature with the flexibility and reproducibility: afforded .with' 

working w-ith the^genetic material of organisms. . ' .; . . 

: The present invention allows one to identify genes encoding bioactivities of; 
■ ■ interest from complexenv^onm?ntal.geneexpressioniibi^^^ 

■ pathways to'evolve', recombinant small molecules with unique activities. Bacteria and 
5 . many eukaryotes have a coordinated mechanism, fqr Regulating genes whose products are 

involved in related processes. The genes are clustered.in structures referred to as ."gene ' 
•clusters," on a single chromosome and are transcribed together. under the control of a . 
, single regulatory sequence, includinga single promoter which initiates transcription>f; . 
\the.entire cluster. The gene Clu.ster,:the promoter, and additional 'sequences that fiiriction 
20 ih regulation altogether are referred to as an "operon" and .can include up to 20 or more 
■ genes, usually W 2 to 6. genes. Thus, a gene . cluster is a group of adjacent geiies that 
are either identical or related, usually as to .iheir function. Gene clusters are of interest 

■ in drug discovery processes since. produCt(s) of gene .clusters include,, for example, 
; amibiptiGS,,£mtii^irals,Wi^ . _ ■ ^ 
25 Some gene families consist of one or more identicarmembers. Clustering is a 

prerequisite for maintaining identity between genes, although clustered genes are not 
necessarily identical. Gene clusters range ' from . extremes where a duplication is 
generated of adjacent related genes to cases where hundreds of identical genes lie in a 
tandem array. Sometimes no significance is discemable in a repetition of a particular 
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gene. A principal example of this is the expressed duplicate insulin genes in some 
species, whereas a single insulin gene is adequate in' other mammalian species. 

Gene clusters undergo continual reorganization and thus, the ability to create 
^hetefopieousliBraries-of-geneclusters-fromr 

5 sources is valuable in determining sources of novel bioactivities, including enzymes such 
as, for example, the polyketide synthases thaf are responsible for the . synthesis of 
polyketides having a vast array of useful activities. 

Polyketides are molecules which are an extremely rich source of bioactivities, 
including, antibiotics (such as tetracyclines and erythromycin), anti-cancer agents 

10 (daunomycin), immunosuppressants (FK506 and rapamycin), and veterinary products 
(monensin). Many polyketides (produced by polyketide synthases) are valuable as 
therapeutic agents. Polyketide sjTithases (PKSs) are multifonctional enzymes that 
catalyze -the bios>Tithesis of a wide varietv' of carbon chains differing in length and 
patterns of functionality and cyclization. Despite tlieir apparent structural diversity, they I < 

1 5 are synthesized by a common pathway in which units derived fi-om acetate or propionate 
are condensed onto the growing chain in a process resembling fatty acid biosynthesis. 
The intermediates remain bound to the polyketide synthase during multiple cycles of • ■ 

chain extension and (to a variable extent) reduction of the -ketone group formed in each 
condensation. The structural variation between naturally occurring polyketides arises ^ 

20 largely from the way in which each PKS controls the number and type of units added, 
and ftom the extent and stereochemistry of reduction at each cycle. Still greater diversity 
is produced by the action of regiospecific glycosylases; methyltransferases and oxidative ^ 

enzymes on the product of the PKS. 

Polyketide synthase genes fall into gene clusters. At least one type (designated 

25 type I) of polyketide synthases have large size genes and encoded enzymes, complicating 
genetic manipulation and in vitro studies of these- genes/proteins. Progress in 
understanding the enzjonology of such type 1 systems has previously been frustrated by 
the lack of cell-free systems to study polyketide chain synthesis by any of these 
multienzymes, although several partial reactions of certain pathways have been 
30 successfully assayed in vitro. Cell-free enzymatic synthesis of complex polyketides has 
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proved unsuccessful, despite more than 30 years of intense efforts, presumably because 
of the difficulties in isolating fully active forms of these large, poorly expressed 

■ multifimctional proteins from naturally occurring producer organisms, and because of the ; 
• relative lability of intermediates formed during the course of polyketide biosynthesis. 

5 In an attempt to overcome some of these limitations, modular PKS subunits have been 
expressed in heterologous hosts such as Escherichia coli and Streptomyces coelicolor. 
Whereas the proteins expressed in E. coli are not fully active, heterologous expression of 
certain PKSs in 5. coe//co/or resulted in the production of active protein. Cell-free 
enzymatic synthesis of polyketides from PKSs with substantially fewer active sites, such 

10 as the 6-methylsalicylate synthase, chalcone synthase,. tetracenomycin STOthase', and the 
PKS responsible for the polyketide component of cyclosporin, have been reported. 

Hence, studies have indicated that in vi/ro' synthesis of polyketides is possible,, 
however, sjnthesis- was always performed with purified enzymes. Heterologous 
expression of genes encoding PKS modular subunits have allowed synthesis of functional 

15 ■ polyketides in vivo., however, there are several challenges presented by this approach, 

■ which had to be overcome. The large size of modular PKS gene clusters (>30kb) make 
., their manipulation oh plasmids difficult. Modular PKSs also of^en utilize substrates 

■ •■ which may be absent in a heterologous host. Finally, proper folding, assembly, and 

posttranslational modification of very large foreign polypeptides are not guaranteed. 
20 The present-invention fiirther relates to a method for discovering molecules which 

" affect the interaction of proteins or.other molecules in in vivo of in vitro systems through 
theuse of fused genes encoding hybrid proteins or fijsed molecules capable of generating 
or inhibiting, or causing the generation of or inhibition of, a detectable signal. 

• The analysis of interactions between proteins and/or other molecules is. a 
25 ^ fimdamental area of bquiry in biology. For instance, ligand-.receptor inteiractions and the 
receptor/effector coupling mediated by Guanine nucleotide-binding proteins (G-proteins) 
are of interest in the study of disease. A large number of G protein-linked receptors 
funnel extracellular signals as diverse as hormones, growth factors, neurotransmitters, 
primary sensory stimuli, and other signals through a set of G proteins to a small number 
30 of second-messenger systems. The G proteins act as molecular switches with an "on" 
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and "off' state governed by a GTPase cycle. Mutations in G proteins may result in either 
constitutive activation or loss of expression mutations. Given the variety of functions 
subserved by G protein-coupled signal transduction, it is not surprising that abnormalities 
"iFG'pfoteirTcoupled'pathvvaysxan^ead-to-diseases-with-manifesta^^ 



5 blindness, hormone resistance, precocious puberty and neoplasia, G-protein-coupled 
receptors are extremely important to drug research, efforts. It is estimated that up to 60% 
of today's prescription drugs work by somehow interacting with G protein-coupled 
receptors. However, these drugs were developed using classical medicinal chemistry and 
without a knowledge of the molecular mechanism of action. .A. more efficient drug 
1 0 discovery program could be deployed by targeting individual receptors and making use 
of information on gene sequence and biological function to develop effective 
therapeutics. The present invention allows one to, for example, study molecules which 
affect the interaction of G proteins with receptors, or of ligands with receptors. 

Proteins are complex macromolecules made up of covalently linked chains of 
15 amino acids. Each protein assumes a unique three dimensional ■ shape determined 
principally by its sequence of amino acids. Many proteins consist of smaller units termed 
domains, which are continuous stretches of amino acids able to fold independently &om 
the rest of the protein. Some of the important forms of proteins are enzymes, polypeptide 
hormones, nutrient transporters, structural components of the cell, hemoglobins, 
20 antibodies, nucleoproteins, and components of viruses. ■ 

Protein-protein interactions enable two or more proteins to associate. A large 
number of non-covaient bonds form between the proteins when two protein surfaces are 
precisely matched, and these bonds account for the specificity of recognition. Protein- ' 
protein interactions are involved, for example, in the assembly of enzyme subunits; in 
25 antigen-antibody reactions, in forming the supramolecular structures of ribosomes, 
filaments and viruses; in transport; and in the interaction of receptors on a cell with 
growth factors and hormones. Products of oncogenes can give rise to neoplastic 
transformation through protein-protein interactions. For example, some oncogenes 
encode protein kinases whose enzymatic activity on cellular target proteins leads to the 
30 cancerous state. Another example of a protein-protein interaction occurs when a virus 
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infects a cell by recognizing a polypeptide receptor on the surface, and this interaction 
has been used to design antiviral agents. ■ . , 

Protein-protein interactions havp. been generally -studied- in. the past using 
biochemical techniques such as cross-linking, co-immunoprecipitation . and. co- 

5 fiactionation by chromatography . A disadvantage of these techniques is that interacting 
proteins often exist in Very low abundance and are, therefore, .difficult to detect. Another 
major disadvantage is'that these biochemical techniques , involve only the proteins, not 
the genes encoding them. When an interaction is .detected using biochemical methods, 
the newly identified protein often must be painstakingly isolated and then sequenced to. 

10 enable the gene encoding it to be obtained. Another disadvantage is that these methods 
' do not immediately provide information about which domains of the interacting proteins 
are involved in the imefaetion. ■ .-Another disadvantage is that small changes in the , 
composition of the imeracting proteins cannot be tested easily for their effect on the 

interaction. . - - 

15 • To avoid the disadvantages inherent in the biochemical techniques for detecting 

protein-protein interactions, genetic systems have recently .been designed. One such 
system is based on transcriptiohal activation. Transcription.is-the process b^^ 
molecules are synthesized using aDN'A template. Transcription is regulated by specific 
sequences in the DMA which indicate when and where KNA synthesis should begin. 
20 These sequences correspond to binding: sites for proteins, designated transcription 
factors, which interact with the enzymatic machinery used for the RNA.polymerization 
reaction. There is evidence that transcription, can be- activated through the use of two 
. . functional domains of a transcription factor: a domain that recognizes and binds to a 
., . specific site pn the DNA and a domain Oiat is necessary for activation, as reported by 
25 Keegan, et .al , Science 231,. 699-704 (1986) and Ma'and Ptashne, Ce//, 48, 847-853 
(1987). The transcriptional activation domain is thought to function by contacting other 
proteins involved in transcription. The DNA-binding domain appears to fiinction to 
position the transcriptional activation domain on the target gene which is to be 
transcribed. In a few cases now known, these two fiinctions (DNA-binding and 
30 activation) reside on separate proteins. One protein binds to the DNA, and the other 
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protein, which activates transcriptions binds to the DNA-bound protein, as reported by 
McKnight etai.,/'roc. Nat 'I Acad Sci. USA, 89, 7061-7065 (1987); another example 
is reviewed by Curran et ai., CW/, 55, 395-397 (1 988). 

-Transcriptional-aetivatiGn-has-been-studied.using.theijA^.pxQteinMMyea^ 



5 Saccharomyces cerevisiae. The GAL4 protein is a transcriptional activator required for 
the expression of genes encoding enzymes of galactose utilization , see Johnston, 
Microbiol, /fev., 51, 458-476 (1987). It consists of an N-terminal domain which binds 
to specific DNA sequences designated UASG, (UAS stands for upstream activation site, 
G indicates' the galactose genes) and a C-terminal domain containing acidic regions, 
1 0 which is necessary to activate transcription, see Keegan et al. ( 1 986), supra.,' and Ma and 
Ptashne (1987), supra. As discussed by Keegan et al., the N-terminal domain binds to 
DNA in a sequence-specific manner but fails to activate transcription. The C-terminal 
domain cannot activate transcription because it fails to localize to the UASG, see for 
example Brent and Ptashne, a//,43, 729-736 (1985). However Ma and Ptashne have 
15 reported (Cell, 51, 113-1 19 (1987); Cell, 55, 443-446 (1988)) that when both the GAL4 
N-terminal domain and the C-terrainal domain are msed together in the same protein, 
transcriptional activity is induced. Other proteins also fiinction as transcriptional 
activators via the same mechanism. For example, the GCN4 protein of Saccharomyces 
cerevisiae as reported by Hope and Struhl, Cell 46, 885-894 (1986), the ADRl protein 
20 of Saccharomyces cerevisiae as reported by Thukra! et al., Molecular and Cellular 
Biology, 9, 2360-2369, (1989) and the human estrogen receptor, as discussed by Kumar 
et al., Ce//,'51, 941-951 (1987) both contain separable domains for DNA binding and for 
maximal transcriptional activation. • 

Genetic systems that are capable of rapidly, detecting which proteins interact with 
25 a known protein, determining which domains of the proteins interact, and providing the 
genes for the newly identified interacting proteins have recently been made available in 
Saccharomyces , (Fields,' S. and Song, O. (1989) Nature 340: 245-247, 

MuUinax; R.L., and Sorge,' J.A. (1995) Strategies 8:3-5). These systems are useful for 
studying protein-protein interactions in-vivo in a eukaryotic host. To date, this has been 
30 viewed as advantageous because of the conditions in eukaryotic hosts that may provide . 
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for folding, solubility and post-translationai modifications (such as phosphorylation) that 
may not occur in prokaryotic systems. Many eukaryotic proteins synthesized in.bacteria 
fold incorrectly or inefficiently and, consequently, exhibit low specific activities. 
Production of authentic, biologically active eukaryotic proteins from cloned DNA 
5 frequently requires post-translational modifications such as accurate disulfide bond 
. formation, glycosylation, phosphorylation, oligomenzation, or specific proteolytic 

■ cleavage-processes that are not performed by bacterial cells. This problem is particularly 
severe when expression of fiinctional membrane or secretory proteins such as cell surface 
receptors and extracellular hormones or enzymes is required. Thus, the need to develop ; 

10 these systems in prokaryotic screening hosts was not apparent and the advantages of such 
a system were not evident until recently. 

With the advent of the ability to access uncultivated organisms in samples and 

■ ' archive the genes of these- samples in cloning vectors in the. form-af gene libraries.for 

eventual screening for bioaciive molecules, the need to utilize systems that allow for the . 
15 screening of very large numbers of clones has rapidly.surfaced. Effective screening of 
these gene libraries requires systems that provide high transformation efficiencies where 
. one can access the millions of clones representing these samples to screen. Eukaryotic 
■systems such as those described are unfortunately plagued with lower transformation 
efficiencies. The ability to work in a prokaryotic host is advantageous. Hence, a major 
20 . advantage, of working with prokaryotic hosts, such as bacteria, lies in the high 
transformation efficiencies afforded by the utilization of these hosts for . screening. 
Furthermore, in working with the eukaryotic hosts described above, it is critical that 
■ :proteins are targeted to the nucleus, since the interaction has to take place in the nucleus. 

Recently, a' genetic system to detect protein-protein interactions in vivo using 
25 transcriptional repression as an assay in E.coli has been described. Genes encoding two 
interacting proteins are fused to a wild t>'pe and a mutant Lex^ DNA binding domain (the 

mutant is a truncated LexA protein devoid of its own oligomerization domain and is 
termed LexA408). LexA is an efficient transcriptional repressor in E.coli only if it acts 
as a dimer. This property is used to exchange the LexA dimerization domain by 
30 heterologous interacting motifs to recover repression. The non-covalent interaction 
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between the hybrid proteins is probed by their capacity to restore the repressor activity 
of truncated LexA proteins .(LexA408). 

The interaction or association of the ftised proteins is specifically measured using 
aTeporter genexontrolled-by-a-hybrid-^M//f-operatorxontaining.a wjldjype half-site and 
5' a mutated half-site (6p408/op+) in a reporter strain (SU202). The iacZ reporter gene is 
under control of the op408/op+ hybrid operator using the sulA promoter, the most tightly 
repressed naturally occurring SOS promoter. Upon co-expression of interacting fusion . 
proteins, lacZ is repressed. A LacH- phenotype yields red colonies with the system, and 
a Lac- phenotype yields white colonies. 
10 Protein ftisions have also. been used to detect and characterize protein-protein 

interactions in E.coii using the phage repressor (Hu J.C. et al.. Science 250, 1400-1403 
(1990)). The NH-terminal DNA-binding domain of bacteriopliage 
repressor dimerizes inefficiently and requires a separate COOH-terminal dimeriziition 
domain to bind strongly to its operator. This propeny- allows one to evaluate the 
15 interaction between hybrid proteins generated utilizing the binding domain and, the 
dimerization domain by their capacity to restore the repressor activit\' of the repressor. 

hi addition to protein-protein interactions, the study of the interaction of 
• other molecules, and the ability to effect this interaction, is of interest in research and 
discovery processes and in the discovery of new drugs. For instance, steroids and their 
20 receptors, or polysaccharides and their receptors. 

■ Summary of the Invention - - 

The present invention allows one to clone genes potentially encoding novel 
biochemical pathways of interest in eukaryotic and/or prokaryotic systems, and screen 
for these pathways utilizing a novel process. Sources of the. genes may be isolated, 
25 individual organisms ("isolates"), collections of organisms that have been grown in 
defined media ("enrichment cultures"), or, most preferably, uncultivated organisms 
("environmental samples"). The use of a culture-independent approach to directly clone 
genes encoding novel bioactivities from environmental samples is most preferable since 
it allows one to access untapped resources of biodiversity. 
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• "Environmental, libraries" are generated; from environmental samples and 
represent the collective genomes of naturally occurring organisms archived in cloning 
vectors that can be propagated in suitable, prokaryotic hosts. Because the cloned DNA 
.. is'initially extracted directly from environmental samples, the libraries are not limited to 

5 the small fraction of prokaryotes that can be grown in pure culture. Additionally, a 

■ nohnalization of the environfnental DNA present in these sarhples could .allow more 
equal representation of the DNA from all of the species present in the original- sample. 
This can dramatically increase the efficiency of finding interesting genes from minor 
constituents of the sample which may be under-represented by several orders of 

10 magnitude compared to the dominant species- • 
■ hx the evaluation of complex environmental expression . libraries, a rate limiting 
step occurs.at the level of discdvery' of bioactivities.. The present invention allows the 
screening of complex environmental expression libraries, contairiing, for example, 
thousands of different organisms. , • ' , . 

15 . ^ In'the present invention, for example, gene libraries generated from one ormore^ 

uncultivated microorganisms are screened for an aciivity of interest. Potential pathways ■ 
- ■ ■ encoding bioactive molecules- of interest are first capturedin prokary otic cells in the form 

■ ' of gene expression' libraries arid screened for activities of interest utilizing the methods ■ 

■ of the present invention. Screening hosts can be modified to contain proteins or other 
20 molecules from metabolically .rich cell lines which can aid in the expression of bioactive 

. compounds such as small molecules. ■ ' 

Thus, the present invention also allows for the transfer of cloned pathways 
derived, from", uncultivated samples into hosts ' for heterologous expression and 
• downstream screenirig W bioactive compounds of inte^^ 

25 herein. 

the present invention provides a method for screening of recombinant bioactive 
' " ' and evolved compounds in vivo or in vitro using a system which can detect enhancers 
' and inhibitors of protein-protein or other interactions, such as those between receptors 
and their cognate targets. The present invention further provides a method for screening 
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of recombinant bioactive, evolved, or other compounds which can affect the interaction 
of molecules which interact with membrane-bound molecules (such as G-proteins), 

An object of this invention is to provide a method by which a multiplicity of 

' p^^tSnsTsuch^aslhose-encoded^by-the-eh^ 

5 tested for inhibition or enhancement of other protein-protein interactions or the 
interactions of other molecules. It is a further object of the present invention to provide 
a method for detection of inhibition or enhancement of protein-protein interactions in 
which the nucleic acid fragments which encode the interacting proteins and potentially 
the inhibitor or enhancer are immediately available when a positive test occurs. 

10 Yet another object of the present invention is toprovide a method for the. 

identification of new genes and new gene pathways. Novel systems to clone and screen 
for bioactivities of interest are desirable. The method(s) of the present invention allow 
the cloning and discover)' of novel, useful bioactive molecules, and in particular novel 
bioactive molecules derived from uncultivated samples." The method(s) of the present 

1 5 invention further allow one to screen utilizing well known genetic systems. 

■ Accordingly, in one aspect, the present invention provides a process for 
identifying clones encoding a specified activity of interest, which process comprises (i) 
generating one or more expression libraries derived from nucleic acid directly isolated • 
• from the environment; and (ii) screening said libraries utilizing a method for detecting 

20 the inhibition or enhancement of interaction of proteins or other molecules in an in vivo 
or in vitro system. 

Another aspect of the present invention provides a process 'for identifying ' 
compounds of interest, which process comprises (i) introducing interacting molecules 
into a host cell under conditions to generate or repress a detectable signal; and (ii) 
25 introducing a third compound or gene or genes encoding a third compound into the host 
* cell from (i); and (iii) screening said. host cell utilizing a method for detecting the 
inhibition or enhancement of interaction of proteins'or other rnolecules in an in vivo or 
in vitro system. 
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Brief Description of the Drawings 

FIGtIRE 1 shows one method of the present invention,: which represents an 

• approach ,t6 screen for small molecules that enhance or inhibit protein^prptein or other 

■ interactions. The DNA binding domain and transcriptional activation domain proteins 
, 5 - are associated with first and second interacting molecules. Interaction of the first and 

second molecules causes transcriptional activation of the detectable , gene (GFP). A gene 

• or. group of genes encoding a third molecule is introduced into the host cell, and the 
. . .: ability of the third mdiecule to affect the interaction of the firsthand second mplecules is 
^ evaluated. For example, clones which alter expression of the detectable gene can be 
4 0 sorted by FACS arid the pathway clone isolated for characterization. 

■ .FIGURE 2 shows another method ofthe present invention, w hich represents an 
approach to screen for small' molecules that enhance or inhibit protein-protein or other , 
. ■'. interactions. The DNA binding domain and- transcriptional repression domain proteins : 
, ' are associated with 'first, and- second interacting molecules. . foteraction of the first and- 
15 second molecules promotes transcriptionairepression of the detectable gene:(GFP), .k" 

■ ■ gene brgroup of.genes encoding a third molecule is introduced into the host cell, arid the . 
. •' ability of -the third molecule to affect the interaction of the first and second molecules is 

■ evaiuat6d..For.iristarice, clones: which altef expression of the detectable gene' caii be ' 
sorted by FACS and the pathway clone isolated for characterization. ■ ' - , - 

20 ■ ' FIGURE 3 shows another method of th? preseiit invention, which represents an 

; . approach to kreen . for riiolecules , that enhance .or inhibit ^otein-protein, or other 
; interactions. Signal^olecules, or molecules which.generate a detectable signal when 
" - they are iri sufficient proximity to each other, are associated with first and second' 
/ ' interacting molecules: Interaction of the first and , second molecules . generates a 
25 detectable signal. A gene or group of genes encoding a third molecule is introduced into 
the host ceU, and the ability of the third molecule to affect the interaction of the first and 
second molecules is evaluated. For instance, clones which alter the presence of the 
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detectable signal molecule can be sorted by FACS and the patliway clone isolated for 
characterization. 

FIGURETshov^ 

approach to screen for molecules that enhance or inhibit protein-protein or other 
5 interactions. Signal molecules, or molecules which generate a detectable signal when 
they are in sufficient proximity to each other, are associated with first and . second 
interacting molecules. Interaction of the first and second molecules generates a 
detectable signal. A third molecule is introduced into the host cell, and the ability of the 
thh-d molecule to afifect the interaction of the first and second molecules is evaluated. For 
10 instance, molecules which alter the presence of the detectable signal molecule can be 
sorted by FACS, ' • 

FIGURE 5 shows a scheme to capture, clone and archive large genome fragments 
from uncuhivated microbes from natural environments. Cloning vectors can be used in 
this process which can archive from 40 kbp (fosmids) to greater than 100 kbp (BACs). 

15 FIGURE 6 shows a high density filter array of environmental fosmid clones 

probed with a labeled oligonucleotide probe. Hie 2400 arrayed clones contain 
approximately 96 miUion base pairs of DN A cloned from a naturally occurring microbial 
community. 
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Detailed Description of Preferred Embodiments 

The method of the present invention begins with the construction of gene libraries 
which represent the collective genomes of naturally occurring organisms archived in 
cloning vectors that can be propagated in suitable-prokaryotic hosts. 
5 The microorganisms from which the .libraries may be prepared include 

prokaryotic microorganisms, such as Eubacteria and Archaea, and lower eukaryotic 
microorganisms such as fungi, some algae and protozoa. Libraries may be. produced from 
environmental samples in which case DNA may be recovered without culturing of an 
organism or the DNA may be recovered from one or more cultured organisms. Such 
10 microorganisms may be extremophiles, such as h>perthemophiles, psychrophiles, 
' psychrotrophs, halophiles, acidophiles, and the like. 

The microorganisms from which the libraries may be prepared may be collected 
using a variet>' of techniques known in the art. Samples mayalso be collected using the 
~ methods detailed in the example provided below' Briefly, the example below provides 
15 a method of selective. m situ enrichment of bacterial andarchaeal species while at. the 
same time Inhibiting the proliferation of eukaryotic members of the population. In situ. 
enrichments can to increase the likelihood of recovering rare species and previously 
uncultivated members of a microbial population: If one desires to obtain bacterial and 
arehaeal species,,nucleic acids from eukaryotes in an environmental sample can seriously 
20 complicate DNA library construction and decrease the number of desired bacterial 
species by grazing. The method described below employs selective agents; such as 
antifungal agents, to inhibit the growth of eukaryotic species. 

In situ enrichment is achieved by using traps composed of growth substrates and 
. * nutritional amendments with the intent to. lure, selectively, members of the^surrounding. 
25 environmental matrix. Choice of substrates (carbon sources) and nutritional amendments 
(ie, nitrogen, phosphorous, etc.) is dependent upon the members of the community for 
which one desires to enrich. Selective agents against eukaryotic members are also added 
to the trap. Again, the exact composition depends upon which members of the 
community one desires to enrich and which members of the community one desires to 
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inhibit. Some of the enrichment media which may be useful in pulling out particular 
members of the community are described in the example provided herein. 

Sources of microorganism DNA as a starting material library from which target 
DNA is obtained are particulSlycontemplated toinclude-en^ 
5 as microbial samples obtained from Arctic and. Antarctic ice, water or permafrost 
sources, materials of volcanic origin, materials from soil or plant sources in tropical 
areas, etc. Thus, for example, genomic DNA may be recovered from either a culturable 
or non-culturable organism and employed to produce an appropriate recombinant 
expression library for subsequent determination of a biological activity. 



10 DNA Isolation 

The preparation of DNA from the sample is an important step in the generation 
DNA libraries from environmental samples composed of uncultivated organisms, or for 
the generation of libraries from cultivated organisms. DNA can be isolated from samples 
using various techniques well known in the art {>Jucleic Acids in the Environment 

15 Methods & Applications, JT. Trevors, D.D. vaii Elsas, Springer Laboratory, 1995). 
Preferably, DNA obtained will be of large size and free of enzyme inhibitors or other 
contaminants. DNA can be isolated directly from an environmental sample (direct lysis), 
or cells may be harvested from the sample prior to DNA recovery (cell separation). 
Direct lysis procedures have several advantages over protocols based on cell separation. 

20 The direct lysis technique provides more DNA with a generally higher representation 
of the microbial community, however, it is sometimes smaller in size and more likely to 
contain enzyme inhibitors than DNA recovered using the cell separation technique. Very 
useftil direct lysis techniques have been described which provide DN A of high molecular 
weight and high purity (Bams, 1994; Holben, 1994). If inhibitors are present, there are 

25 several protocols" which utilize cell isolation which can^be employed (Holben, 1994). 
Additionally, a fractionation technique, such as the bis-benzimide separation (cesium 
chloride isolation) described herein, can be used to enhance the purity of the DNA. 



wo 99/19518 



PGT/US98/ai895 



■ . - 18- - • . 

Isolation of total genomic DNA .'from extreme environmental samples varies 
depending on the source and quantity of material. Uncontaminated, good quality (>20 
kbp) DNA is required for the construction of a representative library for the present 
invention. A successful general DNA isolation protocol is the standard cetyl-triiiiethyl- 

5 ammonium-bromide (CTAB) precipitation techniique; ' A biomass pellet is lysed and 
proteins digested by , the nonspecific protease, proteinase K, in the pressenc.e of the 
detergent SDS. At elevated temperatures and high salt concentrations, CTAB forms 
insoluble complexes with; denatured .protein, polysaccharides and cell debris. 
Chloroform extractions are -performed until the white , interface containing the CTAB 

i 0 complexes is reduced substantially. The nucleic acids in the supernatant are precipitated 
with isopropanol and resuspended in IE buffer, ■ ., ■ 

■ For cells which are recalcitrant to lysis, a. combinatibh of chemical aiid . 
■ mechanical methods with cocktails of various, cell-lysing enzymes may be employed. 
Isolateci nucleic acid may then fijrther be purified using small cesium gradients. 

15 A fmther example ofan isolation strateg>- is detailed in an example below. Tliis ■ 

typeofisolationstrategy is optimal for obtaining good quality, large size DNA fragments 
for cloning. ■ ' ' 

NormalizatioD i 

The present invention can further optimize methods for isolation of activities of 
20 ' interest from a variety of sources, including consortias of microorganisms, primary 
. enrichments, and environmental "uncultivated", samples. .Libraries which have been 
"normalized" in their representation of the genome populations iii the original samples ■ 
are possible with the present invention. These libraries cam then, be screened utilizm^^^ 
methods of the present invention, for enzyme and other bioactivities of interest. 
25 Libraries with equivalent representation of genomes from microbes that can differ 

vastly in abundance in natural populations are generated and. screened. . This, 
"normalization" approach reduces the redundancy of clones from abundant species and 
increases the representation of clones from rare species. These normalized libraries 
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novel biological, catalysts. • ; 

In one embodiment, viable or non-viable cells isolated firom the environment are, 
priorto-the isolation-ofnueleic^acid^for-generation,ofthe.expression g ene. librar y, FACS 



5 sorted to separate cells from the sample based on, for instance, DNA or AT/GC content 
of the cells Various dyes or stains well known in the art, for example those described 
: in "Practical Flow Cytometry", 1995 Wiley-Liss, Inc., Howard M- Shapiro, M.D., are 
■ used to intercalate or associate with nucleic acid of cells, and cells are, separated on the 
FACS based on relative DNA content or AT/GC DNA content in the cells, Other criteria 
0 can ilso be used to separate cells from the sample, as well. DNA is then.isolated from 
the cells and used for the generation of expressiomgene libraries,' which are then screened ; 
for activities of interest. . 

■ ■ ■ Alternatively, the nucleic acid is isolated directly'fron^. the enyironmeni and is,- 
prior to generation of the gene library, sorted based on DNA or AT/GC conteni.. DNA . 
5 isolated directly from the environment, is used intact, randomly sheared or digested to 
general fragmented DNA.' The DNA is then bound- to ah intercalating agent as described 
' above, and separated on the anal>;zer based on i-elative base content .to isolate DNA of 
interest. Sorted DNA is then used for the generation of gene libraries, which are then 
screened for activities of interest. , , . . 

20 As indicated, one embodiment for forming a' normalized library from an 

. eiivironmental sample begins with the isolation of nucleic acid from the sample. This 
"T: nucleic icid canlhen be fractionated prior to nom 
. ■ '. cloning DNA from minor species from the pool of organisms sampled. DNA can be 
. • fractionated using a density centrifiigation technique, such as a cesium-chloride gradient; 
25. When an intercalating agent, such as bis-benzimide is employed to change the buoyant- 
' . density pf the nucleic acid, gradients will fractionate the DNA based on relative base 
content: Nucleic acid from multiple organisms can be separated in this manner, and this: 
technique can be used to firactionate complex, inixtuires of genomes. This can be of 
particular value when working with complex environmental samples. Alternatively, the 
30 DNA does not have to be fractionated prior to normalization. Samples are recovered 
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frbm the fractionated DNA, and.the strarids of nucleie acid are then melted and allowed 
to-selectivdy reanneal under fixed conditions(C,t driven hybridization). When a mixture 
of nucleic acid fragments is melted and al' owed to .reanneal under stringent conditions, 
/.the common sequencesfmdtheircornplementary strand 
5 After an optional single-stranded nucleic acid isolation step, single-stranded nucleic: acid ' 
. ■ representing anenrichmem.of rare sequencesis amplified usingte^^ 

: in the art, such as a polymerase chain reaction (Barnes, 1 994), and used to generate gene 
' jibraries/Tius procedure leads to the ampiification of rare or low abundance nucleic acid 
molecules, which are then.used to generate a gene library which can be screeped for a 
10 desired bidactivity. While DNA will, be recovered, the identification.of the organism(s), , . 
■ . originally containing the DNA may be Aost. This method offers the ability to recover 
. ' . DNA from.^'unclonable'' sources. This method is fonher detailed in the exainple below. , 

{ '. , ; ■ Hence, one' embodiment for forming a noi-raalized library;; from environmental 
13" 'sample(s) is bv (a) isolating' nucleic- acid from the environmental sample(s); (b) ' 

■ ■ optionally fraciionating:the nucleic acid and recovering desired fractions; (c) normalizing 

the representation of the- DNA within the, population so as to foi^n a, normalized 
■ ' expressioh librarv- from the' DNA of the enviroWental sample(s). The no^^^^^^ 

■ ■ processes described ahd exemplified*in detail in. co-pending', commonly assigned U.S.. 
20-^ Serial No. 08/665,565,sfiled June 1^ -. /; ■ . - ' 

; Gene Libraries , ■ - ■ \- 

V ; Geii^ hbraries tan be generated by inserti^ng the normalized or non-normalized^ 
- ' ^ DNA . isolated or derived from a sainple^ M^ Such yebtbrs'^r 

^ ' plksmids are preferably those containing expression replatory sequerices,. inchiduig 
■25 promoters, enhancers atid the like.- Such polynucleotides can be part of a vector and^or 
a composition and still isolated, ia that sucfevector .or. composition js notpart of its 
natural enviromnent. Particularly preferred phage or plasmids and methods for 
introduction and packaging into them are described herein. 
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The examples below detail' procedures for producing libraries from both cultured 
and non-cultured organisms. 

Cloning of DNA fragments prepared by random cleavage of the target DNA can 
also~bFliseaTo"^genm^ 
5 vigorously passed through a 25 gauge double-hubbed needle until the sheared fragments 
are in the desired size range. The DNA ends are "polished" or blunted with Mung Bean 
Nuclease, and EcoRI restriction sites, in the target DNA are protected with EcoRI 
Methylase. EcoRI linkers (GGAATTCC) are ligated to the blunted/protected DNA using 
a very high molar ratio of linkers to target DNA. This lowers the probability of two 
10 DNA molecules ligating together to create a chimeric clone. The linkers are cut back 
with EcoRI restriction endonuclease and the DNA is size fractionated. The removal of 
sub-optimal DNA fragnients and the small linkers is critical because ligation to the vector, 
will result in recombinant molecules that are unpackageable, or the construction of a 
library containing only linkers as inserts.'* Sucrose gradient fractionation is used since it 
15 . is extremely easy, rapid and reliable. Although the sucrose gradients do not provide the 
resolution of agarose gel isolations, they do produce DNA that is relatively free of 
inhibiting contaminants. The prepared target DNA is ligated to the lambda vector, 
packaged using in vitro packaging extracts and grown on the appropriate E. coli. 

As representative examples of expression vectors which may be used there may 
20 be mentioned viral particles, baculovirus, phage, plasmids, phagemids, cosmids, fosmids, 
bacterial artificial chromosomes, viral DNA (e.g. vaccinia, adenovirus, foul pox virus, 
pseudorabies and derivatives'of SV40), PI -based artificial chromosomes, yeast plasmids, , 
yeast artificial chromosomes, and any other vectors specific for specific hosts of interest 
(such as bacillus, aspergillus, yeast). Thus, for example, the DNA may be included in 
25 any one of a variety of expression vectors for expressing a polypeptide. Such vectors 
include chromosomal, nonchromosomal and synthetic DNA sequences. Large numbers 
of suitable vectors are known to those of skill in the art, and are conamercially available. 
The following vectors are provided by way of example; Bacterial: pQE vectors (Qiagen), 
* pBluescript plasmids, pNH vectors, ZAP vectors (Stratagene); ptrc99a, pKK223-3, 
30 . pDR540, pRIT2T (Pharmacia); Eukaryotic: pXTl, pSG5 (Stratagene), pSVK3, pBPV, 
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pMSG, pSVLSV40 (Pharmacia). However, any other plasmid or other vector may be 
used as long as they are replicable and viable in the host. Low copy number or high copy ^ 
number vectdrs may be employed with the present invention. 

k preferred type of vector for use in the present invention contains an f-faetor 
5 ' origin replication. The f-factor (or fertility factor) in £. coli is a plasmid which effects 
" high frequency transfer of itself during conjugation and less frequent transfer of the 
bacterial chromosome itself. A particularly preferred embodiment is to use cloriing 
vectors, referred to as "fosmids" or bacteriar'artificial chromospme (BAQ^.vect^^^ 
These are derived from E. coli f-factor which is able to stably integrate large segnients 
10 of geno'mic^DN A. When integrated with DNA from a mixed uncultured environmental 
sample, this makes it possible to achieve large genomic fragments in the fomi of a stable . 
^ "environmental DNA librar\\" V , v . ' 

V ' .Another preferred type of vector 'for -use in me present invention is-a cosmid • 

' vector. Cosmid vectors were originally designed to clone and propagate large segments 
.15 of genomic DNA. Cloning into cosmid vectors is described in detail in Sambrook,.et al., 
^. Molecular Cloning A Laboratory' Manual,. 2"' Edition, Gold Spring Harbor Laboratory' , 

. Press, 1989. ' ■ ; \ - 

' The DNA sequence in the expression vector is operatively linked to an 
appropriate' expression control sequence(s) (promoter), ao_ .direct 
20 Particular named bacterial promoters .include lacl, lacZ,T3 , T7, gpt, lambda Pr, P^ and 

- trp. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early 

- and late Sy40, LTRs from retrovirus, and mouse metallothionein-L . Selection of the 
appropriate vector, and promoter is well within the level of ordinary skill in the art; , The 
.expressioix vector also contains a ribosome binding site for translation initiation and a 

: 25 transcription terminator. The vector may ' also include appropriate sequences for 
amplifying expression. Promoter regions can be selected from any desired gene using 
CAT (chloramphenicol transferase) vectors or other vectors with selectable markers. 

In addition, the expression vectors preferably contain one or more selectable 
marker genes to provide a phenotypic trait for selection of transformed host cells such 
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as dihydrbfol ate reductase or neomycin resistance for eukaryotic cell culture, or such as 
tetracycline or ampicillin resistance in E, coli. 

Generally, recombinant expression vectors will include origins of replication and- 
selMtable^ mm-lcers* permitting-transformation~o 
5 resistance gene of £. coli and S. cerevfsiae TRPl gene, and a promoter derived from a 
highly-expressed gene to direct transcription of a downstream structural sequence. Such 
promoters can be derived from operons encoding glycolytic enzymes such as 3- 
phosphoglycerate kinase (PGK), -factor, acid phosphatase, or heat shock proteins, 
among others. The heterologous structural sequence is assembled in appropriate phase 
0 with translation initiation and termination sequences, and preferably, a leader sequence 
capable of directing secretion of translated protein into the periplasmic space or 
extracellular medium. 

The cloning strateg>' permits expression via both vector driven and endogenous 
promoters; vector promotion may be important with expression of genes whose 
5 endogenous promoter will not function in E, coli. 

The DNA derived from a microorganism(s) may be inserted into the vector by a 
variety of procedures. In general, the DNA sequence is inserted into an appropriate 
restriction endonuclease site(s) by procedures known in the art. Such procedures and 
others are deemed to be within the scope of those skilled in the art. 
20 The DNA selected, and isolated as hereinabove described is introduced into a 

suitable host to prepare a library whichis screened for the desired activitj'. The selected 
DNA is preferably already in a vector which includes appropriate confrol sequence's" 
whereby selected DNA which encodes for a bioactivit>' may be expressed, for detection 
of the desired activity. The host cell is a prokarj'otie cell, such as a bacterial cell. 
25 . Particularly preferred host cells are £. coli. Introduction of the construct into the host cell - 
can be effected by calcium phosphate transfection, DE AE-Dextran mediated transfection, 
or electroporation (Davis, L., Dibner, M., Battey, L, Basic Methods in Molecular 
Biology, (1986)). The selection of an appropriate host is deemed to be within the scope 
of those skilled in the art from the teachings herein. 
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Host cells are genetically engineered (transduced or transformed or transfected) 
with the vectors. The engineered host cells can be cultured in- conventional nutrient 
media modified as appropriate' for activating ■ promoters, selecting transformants or 
amplifying genes.. The culture conditions, such as temperature; pH and the like, are those 
5 previously used with the; Host cell selected for expression, and will be apparent to the 
ordinarily skilled artisan. • 

Since it appears that many bioactive compounds of bacterial origin are encoded 
.' in contiguous multigene pathways varying from 15 to .100 kbp (11, 12), cloning large 
genome fragments is preferred with the present invention, in order to express novel 
1 0 pathways from natural assemblages of microorganisms. Capturing, and replicating DN A 
fragments of 40 to 100 kbp in surrogate hosts such as E. coli , Bacillus ov Streptomyces 
■ is in effect propagating uncultivated microbes, albeit in the form of large DNA 
fragments each representing from 2 to 5% of a t\'pical eubacterial genome. 
• Two hurdles that must be overcome to successfrilly capture large genome 
1 5 fragments from naturally occurring microbes and to express multigene pathways from^ 
subsequent clones are 1) the low cloning efficiency of environmental DNA and 2) the 
inherent instability of large, clones. To overcome these hurdles, high quality large 
molecular weight DNA is extracted directly from soil and other environments and 
vectors such as the f-factor based Bacterial Artificial Chromosome (BAG) vectors are 
20 used to efficiently clone and propagate large genome fragments. The environmental 
library approach (Figure 1) will process such samples with an aim to archive and 
replicate with a high degree of fidelity the collective' genomes in the mixed microbial 
■ assemblage. The basis of this approach is the application of modified Bacterial Artificial 
Chromosome (BAG) vectors to stably propagate,! 00-200 kbp genome fragments. The 
. 25 BAG vector and. its derivative the fosmid (for f-factor based cosmid) use the f-origin of 
replication to maintain copy number at one or two per cell (14).- This feature has been 
shown to be a crucial factor in maintaining stability' of large cloned fragments (1 5). High 
fidelity replication is especially important in propagating libraries comprised of high GC 
organisms such as the Streptomyces from which clones may be prone to rearrangement 
30 anddeletionof duplicate sequences. 
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Because the fosmid vector uses the highly efficient lambda packaging system, 
comprehensive libraries can be assembled with a minimal amount of starting DNA. 
Environmental fosmid libraries of 4X10' clones of the present invention can be 
genemt'ellr"ea'ch~containing~approximately~40-kbp~^ 
5 purified DNA collected from samples, including, for example, from the bug traps 
described herein. 

A potential problem with constructing libraries for the expression of bioactive 
compounds in E. coli is that this gram-negative bacterium may hot have the appropriate 
genetic background to express the compounds in their active form. One aspect of the 
1 0 present invention allows the efficient cloning of fragments in E. coli and. the subsequent 
. transfer to a different suitable host for expression and screening. Shuttle vectors, which 
allow propagation in tv.'o different tvpes of hosts, can be utilized in the present invention 
to clone and propagate in* bacterial hosts, such as £. coli, and transfer, to alternative hosts 
for expression- of active molecules. Such alternative hosts may include but are not 
15 limited to, for example, Streptomyces or Bacillus, or other metabolically rich-hosts such 
as Cyanobacteria, Myxobacteria, etc. Streptomyces lividans, for instance, may be used 
as the expression host for the cloned pathways. This strain is routinely used in the 
recombinant expression of heterologous antibiotic pathways because it recognized a large 
number of promoters and appears to lack a restriction system (Guseck, T.W. & Kinsella, 
20 ]E\,{\992) Crit/Rev.MicrobiolA%,2Al-2m).^ 

]xi the present invention, tiie example below describes a shuttle vector which can 
be utilized. The vector is an E. coli- Streptomyces shuttle vector. This system allows 
one to stably clone and express large inserts (40kbp genome fragments). Chromosomally 
integrated recombinants can be recovered as the original fosmid to facilitate sequence 
25 characterization and further manipulation of positive clones. Replicons which allow 
regulation.of the clone copy number in hosts can be utihzed. For instance, the SPC2 
replicon, a 32kb fertility plasmid tiiat is present at one copy per cell in Streptomyces 
coelicolor, can be utilized. This replicon can be "tuned" by truncation to replicate at 
various copy number in Streptomyces hosts. For instance, replicative versions of 
30 integrative shuttle vectors may be designed containing the fiiU length and truncated SCP2 
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replicon which will regulate the clone copy number in the Streptomyces host from 1 to 
10 copies per cell. 

In order to ensure that the bioactivity of the clones containing the .putative 
polyketide or other clustered genes is not due to the activation of any resident gene 
5 cluster, "the resident gene sequences can be removed from the host strain :by. gene 
replacement or deletion.- .Aji example is presented below: • " 

Biopanning ; 

After the expression libraries have been generated, one can include the additional 
step of "biopanning" such libraries prior to transfer to a' second host for screening. The 
10 "biopanning" procedure refers to a process for identifying clones, having a specified 
^ biological activity by screening for sequence homologv' in a librar\'. of clones prepared 
by (i) selectively isolating target DNA, from -'DNA derived from at least one 
microorganism, by use of at least one probe ON A comprising at least a portion of a DNA 
sequence encoding an biological having the/specified biological' activity; and (ii). 
15 transforming'a host with isolated target DNA to produce a library of clones which are 
then processed for screening for the specified biological activity. 
■ ' ' The procedure of "biopaiming' ■ is described, arid, exemplified in U.S. Serial No. 
' 0^8/692,002, filed August 2; 1996. ' ■ . - ' t - . * 

' . Further, it is possible to combine all the above embodirhents such that a. 
20 normalization step is, performed prior to generation of the expression hbraiy, the 
expression library is then generated, the expression library so generated is then 
biopanned, and the bioparined expression Ubrary is then screened using a* high throughput 
cell sorting.and screening instrument. Thus there are a variety, of options: z\ e.: (i) one can , 
just generate the library and then screen it; (ii) normalize the target DNA, generate the 
25 expression library and screen it; (iii) normalize, generate the library, biopani and screen; 
or (iv) generate, biopan and screen the library. 
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Screening 

The present invention offers the ability to screen for many types of 
bioactivities, in particular bioactivities which are enhancers and inhibitors of protein^ 
protein or otherinteractionsrsuch as those-between transcription-factors and their — 
5 'activators, or receptors and their cognate targets. 

The biopanning approach described above can be used to create libraries enriched 
with clones carrying sequences homologous to. a given probe sequence. Using this 
approach libraries containing clones with inserts of up to 40 kbp^can be enriched 
approximately 1,000 fold after each round of panning. This enables one to reduce the 
10 above 3,000 filter fosmid library to 3 filters after 1 round of biopanning enrichment. This 
approach can be. applied to create libraries enriched for clones carrying desirable 
sequences. 

Hybridization screening using high density nhers or biopanning has proven an 
efficient approach to detect homologues of pathways containing conserved genes. To 
1 5 discover novel bioaclive molecules that may have no known counterparts, however, other 
approaches are necessary. 

Thus, one aspect of the present invention provides a method for detecting 
molecules which effect the interaction between a first protein and a second protein, or 
between two or more molecules. Molecules to be evaluated may be encoded for by 
20 one or more genes, or may include other molecules not encoded for by nucleic acids, 
including nucleic acids themselves or other molecules generated via, for example, 
combinatorial chemistry, technologies. Such molecules would include, natural or 
synthesized peptides, natural products and. synthesized products. 

Prokaryotic or eukaryotic hosts may be utilized with the method of the present 
25 invention. The host cell may contain a detectable gene having a binding site for the 
DNA-binding domain of a transcriptional activator, such that the detectable gene 
expresses a detectable protein when the detectable gene is transcriptionally activated. 
' Such activation occurs when the transcriptional activation domain of a transcriptional 
activator is brought into sufficient proximity to the DNA-binding domain of the 
30 transcriptional activator. Altematively, the host cell contain a detectable gene having 
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, . a binding site for a binding domain of a transcriptional rep^^^^^ 

■ .detectable gene expresses when the repressor is not^bound. Such repre 

when the domains of the repressor'are brought into sufficient.proximity to each other. 
Altematiyely/interacting molecules can be fiised or associated 
5 which, vvhen brought in sufficient proximity to each other, generate a detectable ; 

signal 

■ ■ In one aspect.of the present invention; a first chimeric gene is provided which 

■ is capable ofbeingexpressedin the host cell. The first chimeric gene comprises a 

' DNA sequence that ebcodes a first hybrid protein. The first hybrid protein contains 
10 either a DNA-bindiiig domain that recognizes the binding site hear the 

gene in the host cell, or a molecule which when brought into sufficient proximity to 
- the moleculeon a'seeondhybridprotein, generates a detectable signal. The first . , ■ 
hvbrid protein also contains a first protein or protein fragment .which is to be 

. interacted with a second protein or protein fragment..The first chimeric gene may be 
15 present in a chromosome ofthe host cell, or it maybe encoded on.alibrary of ... 

■ plasmids or other vectors that contain genomic, cDNA or synthetically generated ■ • 

■ DNA sequences fused to the DNA sequence encoding the DNA-binding domain. , ' ' 

A second chimeric gene is provided which is capable of being expressed in the 
hosteell. Iiioneembodirhent,boththefirstandthesecondchimericgenesare ^ • 

20 introduced into the host cell in the form ofplasmids or other vectors. Inanother ., . 
embodiment, the first chimeric gene is present in a chromosome of the host cell and 
the second chimeric gene is introduced into the host cell as part of a plasmid or other 

• type bfvector.. The second chimeric gene contains a DNA.sequence that encodes a 

■ ;second hybrid.i)rotein: The second hybrid protein cqnt^^^ 

25 ■ domain which when interacted with the binding domain and the operator or other 
sequence near the detectable gene in the host cell, causes transcriptional activation of 
the detectable gene. Alternatively, the second hybrid protein contains a dimerization ' 
or other domain ("transcriptional repressor") which when interacted with the binding 
domain and the operator or other sequence near the detectable gene in the host cell, 

30 causes transcriptional repression of the detectable gene. Alternatively, the second 
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hybrid protein contains a molecule which when brought into sufficient proximity to 
the molecule on a first hybrid protein, generates a detectable signal. The second 
hybrid protein also contains a second protein or a protein fragment which is to be 
tesFedToFinteraction witH"the'firstprotein or proteirTfirapnent: Preferablyrthe^DNA- 
5 binding domain of the first hybrid protein and the transcriptional activation domain or 
transcriptional repressor of the second hybrid protein are derived from transcriptional 
activators or repressors having separate DNA-binding and transcriptional activation 
domains or dimerization or other domains as described above. Many proteins ' 
involved in transcription have separable binding and transcriptional activation or 

10 "repressor" (as described above) domains which make them useful for the present 
invention. In another embodiment, the DNA-binding domain and the transcriptional 
activation or dimerization or other domain (as described above) may be from different 
transcriptional activators or repressors. The second hybrid protein may be encoded 
on a librarj' of plasmids or other vectors that contain genomic, cDNA or synthetically. , 

1 5- generated DNA sequences fused to the DNA sequence encoding the transcriptional 
activation domain or transcriptional repressor. ' ' " 

, Alternatively, in the method of the present invention, a first test protein 
associated with a DNA-binding domain and a second test protein associated with a 
transcriptional activator or repressor may also be introduced into the cell as protein 

20 products instead of as genes as described above. Said associated proteins may be 
generated.or synthesized in vitro, or in vivo, and protein products may be introduced 
into the host cell for screening utilizing the methods of the present invention. If 
activators or repressors are not employed to activate or repress the expression of a 
detectable gene, the- first test protein and the second test protein can be associated 

25 with molecules which generate a detectable signal when they are brought into 

sufficient proximity to each other, and introduced into host cells to be screened using 
the methods of the present invention. . 

Therefore, in the case of the utilization of a transcriptional activator, 
interaction between the first protein and the second protein in the host cell, causes the 

30 transcriptional activation domain to activate transcription of the detectable gene. The 
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method is carried out by introducing the first chimeric gene and the second chimeric ' 
gene into the host cell. ' The host cell is subjected to conditions under which the first: 
hybrid protein and the second hybrid protein are expressed in sufficient quantity for 
: the detectable gene to be activated. The cells are then tested- for their expression of . 
5 the detectable gene to a greater .degree than in the absence of an interaction between 
the .first protein and the second protein. A.third gene or gene cluster is then 
introduced into the host cell, and ah enhancement or inhibition of the interaction of 
the first and second proteins is evaluated, by increased pr decreased expression of the 
detectable gene. Enhanceme;nt of the interaction between the first and s^^ 
1 0 could yield an increase in expression of the detectable gene, while inhibition of the 

interaction between the first and second protein would yield a decrease in expression ; • 
of the; detectable gene. The third gferie or gene cluster can also be introduced into the, 
cell priortp the introduction of the' first'and^or second, gene, or simultaneously with . 
the^ first and/or second gene. Alternatively, the molecule to be evaluated for its effect. 
15 on the interactionof the first two proteins can' be introduced directly 'imo' the host cell 
not in the form of a getie or gene cluster, but as a product. 

' ; to the case of the utilizanon of a transcriptional repressor ' 
the first protein and the second protein in the host cell, causes the transcripfionaf • . ; 
repression domain to repress transcription ofthe detectable gene: The method is . ; 
20 carried out by introducing thejirst chimeric gene and the second chimeric gen^^ 
thehostcell. The host cell is, subjected to conditions under which the first hybrid 
protein and the second hybrid protein are expressed in sufficient quantity for the 
detectable gene to be inactivated. /,The cells are then tested for their lack of expression 
of the detectable gene. A third gene or gene cluster is then introduced intp the host 
25 cell, and an iiihibition of the interaction betweeh the first and second protein results in ^ 
an increase in expressionof the detectable gene. Again, the third gene or gene cluster . 
can be introduced into the cell prior to the introduction of the first and/or second 
gene, or simultaneously .with the first and/or second gene, and the- molecule to be 
evaluated for its effect on the interaction of the first two proteins can be introduced 
30 direcfiy into the host cell not in the form of a gene or gene cluster, but as a product. 
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' In the case of the utiHzation of molecules which can associate when in 
proximity to each other to generate a detectable signal, interaction between the first 
protein and the second protein in the host cell causes a detectable signal to be 
" — ^-genemtMrThe-methodiscarried^^ 
5 second chimeric gene into' the host cell. The host cell is subjected to conditions under 
which the first hybrid protein and the second hybrid protein are expressed in 
sufficient quantity for the detectable signal to be generated. The cells are then tested ' 
for the presence of the detectable signal. A third gene or gene cluster is then 
introduced into the host cell, and an enhancement or inhibition of the interaction 
10, between the first and second protein results in an increase or decrease in detectable 
signal. Again, the third gene or gene cluster can be introduced into the cell prior to 
the introduction of the.first andy'or second gene, or simultaneously with the first and/or 
second gene, and the molecule to be evaluated for its effect on the interaction of the 
first two proteins can be introduced directly into the host cell not in the form of a gene 
1 5 or gene cluster, but as a product. 

Thus, enhancement or inhibition of interactions ber.veen a first protein and a 
second protein by a library of third test proteins or molecules can be tested. For 
example, the first and secorid proteins may be derived from bacteria, or viruses, 
and/or may be an oncogene-encoded protein, a growth factor or an enzyme. The third 
20 protein may be derived from a gene librarj' described herein or may be any molecule 
desired to be screened for its potential to effect the interaction of other molecules. 

The screening aspect of the present invention may be practiced using three 
vectors and a host cell. The first vector contains a promoter and may include a 
transcription termination signal functionally associated with the first chimeric gene in 
25 order to direct the transcription ofthe first chimeric gene: The first chimeric gene 
includes a DNA sequence that encodes a DNA-binding domain, and a unique 
restriction site(s) for inserting a DNA sequence encoding a first protein or protein 
fragment in such a manner that the first protein is expressed as part of a hybrid protein 
with the DNA-binding domain. The first vector also includes a means for replicating 
30 itself in the host cell. Also included on the first vector is a first marker gene, the 
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expression of which in the host cell permits selection of cells containing the first 
marker gene from cells that do not contain the first marker gene. Preferably, the first • 
vector is a phage, cosmid, plasmid, phagemid, or fosinid or other BAG vector. ; 
' ' The second vector will contain a second chimeric gene. The second chihieric 
5 gene also includes a promoter and a transcription lemiination signal to. direct 

transcription. The second chimeric, gene also includes a DNA sequence. that encodes 
a transcriptional activation domain and a unique restriction site(s) to insert a DNA 
: sequence encoding the second protein or protein fragment into the vector, in such a ' 
manner that the second protein is capable of being expressed as part of a hybrid 
10 protein with the transcriptional activation domain. Preferably, ;he DNA-binding 
domain of the first hybrid protein and the transcriptional activation domain of tHe 
, second hybrid protein are derived from transcriptional activators having separate . 
DNA-binding and transcriptional, activation domains.-. Many proteins involved in 
• : transcription have: separable binding and transcriptional activation domains which 
15 make thenvusefal for the present invention. In another embodiment, the DNA 
binding dbrnain and the transcriptional activatiori domain may be fro 
transcriptional activators. The second.hybrid protein may be encoded on a library of 
^. vectors that contain genomic, cDNA or synthetically generated DNA sequences fo^^ 

to the DNA sequence encoding the transcriptional activation domain. 
20 . : Alternatively, the second chimeric gene includes instead of a DNA sequence 
that encodes a transcriptional repression domain, and a unique restriction sUe(s) to 
insert a DNA sequence encoding the second protein or protein fragment into the . 
vector, in such.a manner that the second protein is capable^of being expre.ssed as part 
of a hybrid protein with the transcriptional repression doniain. Preferably, ^^^^ 
25 binding domain of the first hybrid protein and the transcriptional repression domain of 
the second hybrid protein are derived .from transcriptional .repressors having separate 
DNA-binding and transcriptional repression domains. Many proteins involved in 
transcription have separable binding and transcriptional repressipn domains which 
make them useful for the present invention. In another embodiment, the DNA- ^ 
30 binding domain and the transcriptional repression domain may be from different 
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transcriptional repressors. The second hybrid protein may be encoded on a library of 
vectors that contain genomic, cDNA or synthetically generated DNA sequences fused 
; to the DNA sequence encoding the transcriptional repression domain. 

-The second-veetor-further-ineludes a means-for-replicating4tself4n-the-ho 



5 cell. The second vector also includes a second marker gene, the expression of which 
in the host cell permits-selection of cells containing the second marker gene from cells 
that do not contain the second marker gene. 

The third vector contains a promoter and may include a transcription 
termination signal functionally associated with the third gene or gene cluster in order 
10 to direct the transcription of the third gene or gene cluster. The third vector includes a 
unique restriction site(s) for inserting a DNA sequence encoding a third protein, 
group of proteins (for example, those encoded by a gene cluster) or protein fragment. 
The third vector also includes a means for replicating itself in the host cell and in 
bacteria. The third vector can also include a third marker gene, the expression of 
15. which in the host cell permits selection of cells containing the third marker gene ifrom 
cells that do not contain the third marker gene. Preferably, the third vector is a phage, 
cosmid, plasmid, phagemid, or fosmid or other BAG vector. 

Altematively, the screening aspect of the present invention is not practiced 
using vectors, but rather using interacting hybrid molecules associated with DNA- 
20 binding domains and transcriptional activators or repressors, or with other molecules 
which generate a detectable signal when brought into sufficient proximity to each 
other, and a host cell. 

The host cell is a eukaryotic or a prokaryotic cell. The host cell contains the 
detectable gene having a binding site for the DNA-binding domain of the first hybrid 
25 protein. The binding site is positioned so that the detectable gene expresses a 
detectable protein when the detectable gene is activated by the transcriptional 
activation domain encoded by the second vector. Activation of the detectable gene is 
possible when the transcriptional activation domain is in sufficient proximity to the 
detectable gene. Altematively, in the case of the utilization of a transcriptional 
30 repressor, the repressor binding site is positioned so that the detectable gene does not 
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express the detectable protein when the detectable gene is repressed by the 
transcriptional repression domain encoded, by the second vector. Repression of the 
detectable gene is possible when the transcriptional.repression domain is in sufficient 
proximity to the detectable gene. Alternatively, in the case of the use of other , 

5 associating molecules which associate to generate a detectable signal, no detectable 
gene is present in the host cell. The host cell, by itself, is incapable of expressing a 
protein having a function of the first marker gene, the second marker gene, the DNA-' 
binding domain, the transcriptional activation domain, or the molecules capable of 
. associating to generate a detectable signal. The DNA binding domain and the ^ 

1 0 transcriptional activation or repression domain, and the associating molecules in the 
latter case, are incapable of interacting with each other unless the first and second 
proteins bring them together by their interaction. 

Accordingly; in thexase of the utilization of.a transcriptional.activator, the 

interaction of the first protein and the second protein in the host cell causes a 
15 measurably greater expression of the detectable gene than when the DNA-binding 
domain and the transcriptional activation domain are present, in the absence of an 
interaction betvveen the first protein arid the second protein. Alternatively, in the case 
■ of the utilization of a transcriptional repressor, the interaction of the first protein and 
the second protein in the host cell causes repression of expression of the detectable 

20 gene. ■ 

The detectable gene may encode an enzyme or other product that can be 

readily measured. Such measurable activity may include the ability of the cell to 
grow only when the marker; gene is transcribed, or the presence of detectable en2yme 
activity only when the marker gene is transcribed, Various other markers are well 
25 known by the skilled artisan. . 

Fluorescent indicators which interact when associated to create a detectable 
signal may be utilized in the method of the present invention. For example, green 

fluorescent protein (GFP) mutants can be generated which have increased 
fluorescence resonance energy transfer between flanking GFP's. Thus fluorescence 
30 resonance energy can be transferred and measured upon association of mutant GFP 
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molecules. First test proteins associated with one GFP mutant, and second test 
proteins associated with another test mutant can come, together and an energy transfer 
can occur and be measured via the association of the two GFP molecules. Hybrid 
molecules are stnictured such-that-GFP~mutants-are not-interacting-independ 
5 only upon interaction of the proteins or molecules associated with the mutants. Third 
molecules can then be evaluated for their capability to affect the interaction of the 
first two molecules. 

The cells containing the two hybrid proteins are incubated in an appropriate 
medium and the culture is monitored for the measurable activity. In the. case of the 

1 0 utilization of a transcriptional activator, a positive indication that .the first protein and 
the second protein have interacted is expression of the detectable gene. Such 
interaction brings their respective DNA-binding and transcriptional activation 
domains into sufficiently close proximity to cause transcription of the marker gene. In 
the case of the utilization of a transcriptional repressor, a positive indication that the 

1 5 first protein and the second protein have interacted is repression of expression of the 
detectable gene. Such interaction brings their respective DNA-binding and 
transcriptional repression domains into sufficiently close proximity to cause 
repression of transcription of the marker gene, hi the case where two molecules are 
coming together in close enough proximity to generate a detectable signal,, a positive 

20 indication that the first and second molecules have interacted is detection of the 
signal. 

The third vector containing the third gene or gene cluster encoding a third 
protein or group of proteins is introduced into the cells containing the two hybrid 
proteins which are interacting. Alternatively, the third protein or other molecule is 

25 introduced directly into the cells containing the two hybrid proteins which are 
interacting. Expression of a third protein or group of proteins, or the presence of a 
third molecule which inhibits or enhances the interaction of the first and second 
proteins yields a measurable difference in expression of the detectable gene. Such 
difference is evaluated by detection of the amount of detectable molecules produced 

30 by the detectable gene. ]n the case of the utilization of a transcriptional activator, 
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inhibition of the interaction between the two proteins results in a decrease in the . 
expression levels of the detectable molecule and enhancement of the interaction 
between the two proteins results in an increase in the expression levels of the 
detectable molecule. In the case of the utilization of a transcriptional repressor,, v 
5 inhibition of the interaction between the two proteins results in an increase in the , . 
expression jevels of the detectable molecule. Enhancement or inhibition of the . 
interaction between the first two proteins in the case of the utilization of associating 
molecules which generate a detectable signal results in a decrease or increasean the 
detectable signal, respectively. : ■ 

1 0 ; The system is dependent on a number of conditions to pi^operly carry' out the 

method of this, invention. The first interacting protein X must not itself,xarry an 
activation or repression domain for the marker. Otherwise the activation domain 
. would allow transcription or repress transcription of the marker gene as soon-as the 
vector encoding only the DNA-binding domain fused to the first interacting protein X 
15 is introduced. The. interaction between the first protein X and the second protein.Y 
must be capable of occurring, within the host cell. The activation domain portion of 
the hybrid containing the' second protein Y must be accessible to.,the transcription 
machinery of the cell to allow transcription of the marker gene. In the case of the 
utilization of a , transcriptional repressor, the detectable gene should be expressing in 
20 the absence of the interaction of the two hybrid proteins. . Should any of these 

conditions not exist, the system-may be modified for use by constructing hybrids that 
carry only portions of the interacting proteins X and Y and thus meet these 
conditions. . J . ' - ' 

^ ^ / This system can be used 'to select generically for proteins or groups of proteins 
25 that inhibit or enhance the interaction of other proteins or molecules. Prokaryotes 
containing a known protein as a hybrid with the DNA-binding domain can.be 
transformed with a clone bank of genomic or cDNA sequences fused to the activation 
or repression domain. The double transformants can be screened for expression of the 
detectable marker. The third protein or group of proteins can then be introduced and 
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expressed. Inhibition or enhancement of the interaction- between the first and second 
proteins can then be evaluated. 

Since prokaryotes use similar transcription mechanisms, a variety of cells can 
be usedloTesFfor proteiff^teirTimeractionsr' ~ 
5 served by any of a large variety of genes, such as genes encoding drug resistance or 
metabolic enzymes. Any transcriptional activator or repressor that has separable 
domains for DNA-binding and for transcriptional activation or repression can be 
employed. Indeed, any protein, even one that is not a transcriptional activator or 
repressor, that has two separable functions can be used to establish a similar genetic 
10 system to detect enhancement or inhibition of protein-protein or other interactions. 
Accordingly, the method of the present, invention can be applied more 
generally to utilize any detectable function requiring separable domains of an amino 
acid sequence which can be reconstituted. This general embodiment of the present 
invention detects inhibition or enhancement of the interaction, between a. first protein 
15^ and a second protein. The method includes providing a host cell which is defective in 
a detectable function. The detectable function is provided by an amino acid sequence 
having separable domains. Thus, the amino acid sequence includes a first domain and 
■ a second domain which are capable of producing the detectable function when they 
are in sufficient proximity to each other in the host cell. 
20 A first chimeric gene is provided that is capable of being expressed in the host 

cell. The first chimeric gene includes a DNA sequence that encodes a first hybrid 
protein. The first hybrid protein contains the first domain of the amino .acid sequence. 
The first hybrid protein also contains a first protein or protein fragment which is to be 
interacted with a second protein or protein fragment. . . 
25 ^ A second chimeric gene is provided which is capable of being expressed in the 
host cell. The second chimeric geiie contains a DNA sequence that encodes a second 
-hybrid protein. The second hybrid protein contains the second domain of the amino 
acid sequence. The second hybrid protein also contains a second protein or protein 
fragment which is to be interacted with a first protein or protein fi-agment. 
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The interaction between the first protein and the second protein in the host 
cell, causes the function of the amino acid sequence to be. reconstituted: The method 
is thus carried out by introducing the first chimeric gene and the second chimeric gene 
into the host cell. The host cell is subjected, to conditions under which the first hybrid 
5 protein and the second hybrid protein are expressed in sufficient quantity for the . 
function of the amino acid sequence to be reconstituted. The cells are then tested to , 
determine whether their expression of the ftinction of the amino acid sequence has 
been reconstituted to a degree greater than in the absence of thejnteraction of the test . 
proteins. The third vector encoding a third protein or group of proteins is then 
1 0 introduced into the host cell. A protein or group of proteins which enhance or inhibit 
• the interaction ^of the first' and second proteins is then detemiined by e valuadng, . 
expression levels of a detectable gene.* , 

This generalized method can be made more specific, for example, as described . 
for the preferred method of the present invention m which, the delectable function is. ' 
1 5 transcription of a detectable gene or repression of transcription of a detectable gene. 
In this method, the first domain of the amino acid sequence includes a DNA-binding 
domain that recognizes a binding site on the detectable gene, and the second of the - 
amino acid sequence includes a transcriptional activation or repression domain. 

" hi the generalized method, described above, the host^cell is prokaryotic or - 
20 eukaryotic cell. In carrying out this method, the first and/or second proteins may be 
derived firom a bacterial protein, a viral protein, an oncogen-encoded protein^ a 
growth factor or an enzyme. The second hybrid protein may also be encoded on a 
library ofplasmids containing DN A inserts that are^ &^ i 
cDNA,;or synthetically generated PN A sequences fiised to d^^ DNA sequence 
25 encoding the second amino acid domain. The third protein or group of proteins may - 
be encoded on a library of plasmids containing DNA inserts that are derived from 
genomic DNA, cDN A, or synthetically generated DNA sequences fused to the DNA 
sequence encoding the third amino acid domain. 

The method of the present invention may also be utilized to evaluate the 
30 inhibition or enhancement of interaction of molecules other than proteins. Literacting 
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molecules can be fused to the separate domains (DNA-binding domains and 
transcriptional activation'or repression domains) of transcriptional activators or , 
repressors. Typically, transcriptional activators or repressors are proteins. 
Altemativelyrinteracting-molecules can-beftisedto-molecules-which^when-^ 
into sufficient proximity to each other, c^ generate a detectable signal. Certain 
linkers for linking or associating two or more molecules together, and/or for linking . 
molecules to proteins are known in the art (EXAMPLES). Linkage or association of 
interacting nlolecules to these domains or molecules; yields hybrid molecules -that^ . 
upon interaction of the two interacting molecules, are capable of activating or 
repressing expression of a detectable gene, or generating a detectable signal . 
themselves, as described above. The hybrid molecules are introduced into host cells. 
A library of third proteins or group of proteins are then introduced into the same host 
cells, either by introduction of the gene encoding the protein or genes encoding these 
proteins, or by introduction of the proteins themselves, or of molecules generated via 
expression of the multiple proteins (for example, the molecules generated via 
expression of a gene cluster or pathway), or of molecules desired to be evaluated, 
such as molecules generated via combinatorial chemistry technologies. Enhancement 
or inhibition of the interaction between the interacting.molecules can then be 
evaluated by the effects on gene expression of the detectable molecule, or presence of 
the detectable. molecule, as described above, 

FACS screening of clones using the methods of the present invention can be 
performed asdescribed in U.S! Patent Application Number . , Filed 

June 1 6, 1 997. Other devices which utilize detectors capable of detecting any 
detectable molecule utilized in a method of the present invention may be employed. 
Such devices include, but are not limited to a variety of high throughput cell cell 
sorting instruments, robotic instruments, and time-resolved fluorescence instruments,, 
which can actually measure the fluorescence from a single molecule over an elapsed 
period of time. 
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- ' ' : After screening, positive clones, in 

encoding activities of interest is utilized, are recovered, .and DNA is isolated, from 
positive clones utilizing teclmiques well kjiown in the ^^^^^^ The DNA can then be^ 

5 ampiified either y/vo of m vZ/ro' by utilizing any of the various, amplificatioh 
techniques known in the art In vivo amplification would include transformation of the 
clone(s) or subclone(s) of the clones into a yiable host, -followed by growth of the host. 

' In v//raamplificatioh' can be performed using techniques such as the polymerase chain 

reaction. ^ . * - - 

10' ■ The' clones w^hich are identified as having the specified activity may then be 

sequenced to identify the DNA sequence encoding a' bi0activit>' having the specified- 
activity: : Thus/ in accordance with the present invention it is possible to isolate and 
identify: (i) DNA encoding 'a. bioactivify' having a specified acti:vit\', (ii);bioactivities . 
having such activity (including the amino; acid, sequence thereof) and (iii) produce.. 
15 recombinant molecules Having such activity. ^ . ■ 

Evolution ... ■ . ■ ^ ' ^, ' .'; ' ' 

■ One, advantage, afforded by a recombinant approach \o the discovery of novel - 
bibactive compounds is the ability to manipulate pathway subunits to generate and select , . 
for variants with altered specificity. Pathway subunits can be substituted or individual* 

20 subunits can be evolved utilizing methods described below,' to select for resultant 

bioactive molecules with different activities. • * - : 

Clones foimd to have the bioactivity for which the screen was performed can be 

. ^ subjected to directed mutagenesis to develop nev/ bioactivities with more desirable 

properties or to develop modified bioactivities with particularly desired properties that 
25 are absent or less pronounced in the wild-type activity, such as stability to heat or organic 
solvents. Any of the known techniques for directed, mutagenesis are applicable to the 
invention. For example, particularly preferred mutagenesis techniques for use in 
accordance with the invention include those described below. 
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The term "error-prone PGR" refers fo a process for performing, PGR under 
conditions where the copying fidelity of the DNA polymerase is , low, such that a high 
rate of point mutations is obtained along the entire length of the PGR product. Leung, 
D.W.,er al. , Technique, 1 : 1 1 - 1 5 ( 1 989) and Caldweli;RrGr&~Joyce^GT~PCR~Meth'ods 
5 Applic, 2:28-33 (1992). 

' • The temi "oligonucleotide directed mutagenesis" refers to a process which allows 
for the generation of site-specific mutations in any cloned DNA segment of interest. 
Reidhaar-Olson, J.F. & Sauer, R.T., eM/., Science, 241 :53-57 (1988). 

The term "assembly PGR" refers to a process which involves the assembly of a 
1 0 PGR product from a mixture of small DNA fragments. A large number of different PGR 
reactions occur in parallel in the same viaL with the products of one reaction priming the 
products of another reaction. 

The term "sexual PGR mutagenesis" (also known as "DNA shuffling") refers to 
forced homologous recombination between DNA molecules of different but highly 
15 related DNA sequence in vitro, caused by random fragmentation of the DNA molecule 
based on sequence homology, followed by fixation of the crossover by primer extension 
in a PGR reaction. Stemmer, \V,P., PNAS, USA, 91 :10747-10751 (1994). ' 

. The term ''in vivo mutagenesis" refers to a process of generating random 
mutations in any cloned DNA of interest which irivolves the propogation of the DNA in 
20 a strain of £ coli that carries mutations in one or more of the DNA repair pathways. 
These "mutator" strains have a higher random mutation rate than that of a wild-type 
parent. Propogating the DNA in one of these strains will eventually generate random 
. mutations within the DNA. 

The term "cassette mutagenesis" refers to any process for replacing a small region 
25 of a double stranded DNA molecule with a synthetic oligonucleotide "cassette" that 
differs from the native sequence. The oligonucleotide often contains completely and/or 
partially randomized native sequence. 

The term "recursive ensemble mutagenesis" refers to an algorithm for protein 
engineering (protein mutagenesis) developed to produce diverse populations of 
30 phenotypically related mutants whose members differ in amino acid sequence. This 
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method uses a feedback mechanism to control . successive rounds .of combinatorial 
cassette' mutagenesis: Arkin, A.P; and .Youvan, D,C., PNAS, USA, 89:7811-7815^ 

{\992i ' ' 

The term "exponential ensemble mutagenesis" refers to a process for generating 
5 combinatorial libraries witli a high percentage of unique and functional mutants, wherein ; ^ 
small groups of residues are randomized in parallel to identify, at each.altered position, . 
' amino ^acids which lead to : functional proteins, Delegrave, S. and Youvan, D.C., 
- Bioteciinology Research, 11:1548-1552 (1993); and random and site-directed 

mutagenesis, Arnold, F.H., Current Opinion in Biotechnology, 4:450-455 (1993). , 
10 All of the references mentioned above are hereby incorporated by reference in 

' their entiret>\ Each of these techniques is described in detail in the references mentions 
- DNA encoding desirable molecules identified utilizing the methods of the present 
invention can be mutagenized, or '^evolved'y utilizing any one or more of these . 
' .techniques, and rescreened utilizing the methods of the present invention to'identify more 
15 desirable clones. ' ' ■ ' ' 

The invention will now be illustrated by the following working examples, which 
are in ho way a hmitation thereof , . ^ " ■ . 
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Example 1 
Sample Gollection 

' ~~"'^Sarnple'toBeutOiz^^ 

may be collected according to the following example: 

5 The following represents a method of selective in situ enrichment of bacterial 

• and archaeal species while at the same time inhibiting the proliferation of eukaryotic 
members of the population. 

•In situ enrichment is achieved by using traps composed of growth substrates 
and nutritional amendments with the intent to lure, selectively, members of the 

10 surrounding environmental matrix, coated onto surfaces. Choice of substrates (carbon 
sources) and nutritional amendments (ie, nitrogen, phosphorous^ etc.) is, dependent 
upon the members of the communit>' one desires to enrich. Selective -agents against 
eukaryotic members are also added to the trap. Again, the exact composition will / 
depend upon which members of the community one desires to enrich and which 

15 members of the communit)' one desires to inhibit. Substrates include monomers and 
polymers. Monomers of substrates, such as glucosamine, cellulose, pentanoic or 
other acids, xylan, chitin, etc., can be utilized for attraction of certain types of 
microbes. Polymers can also be used to attract microbes that can degrade them. 
Some of the enrichment media which may be useful in pulling out particular members 

20 of the conmiunity is described below: 

1 . Addition of bioactive compounds against fungi and microscopic eukaryotes: 

^ Proliferation of eukaryotic members of the conimunity may. be inhibited by the 
use of one or more commercially available compounds such as nystatin, 
cycloheximide, and/or pimaricin. These compounds may be sprinkled as a powder or 
25 incorporated as a liquid in the bug trap medium. 
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2. Addition of bioactive. compounds against other bacterial species: 

* Compounds which inhibit the growth of some bacterial species but not others 
(ie, polymyxin, penicillin, and rifampin) may be incorporated into the enrichment 
- medium. Use of the compounds is dependent upon which members of the bacterial 
5 community one desires to enrich. For example, while a majority of the Streptomyces 
are sensitive to polymyxin, penicillin, and rifampin, these may be used to enrich for 
rare members of the family which are resistant. Selective agents may also be used in 
enrichments for archaeal members of the community. 

3 . Use of carbon sources as selective agents: 
10 Any particular carbon source can be utilized by some members, of the . 

' communit}^ and not others. Carbon source selection thus depends upon the members 

of the communit)' one desires to enrich. For example, members of. the" 

5rrepraw3^cera/e5 tend to utilizexomplex,* polymeric subst . 

chitin, and lignin. These complex subtrates, while utilized by other genera, are 
1 5 recalcitrant to most bacteria. These complex substrates are utilized by fungi, which 

necessitates the use of anti-fungal agents, mentioned above. , „ 

4. Addition of nitrogen sources: 

The use of additional nitrogen sources may be called for depending upon the 
choice for carbon source. For example, while chitin is balanced in its C:N ratio, 
20 cellulose is not. To enhance utilization of cellulose (or other carbon-rich substrates), 
it is often useful to add nitrogen soures such as nitrate or ammonia. 

5. Addition of trace elements: 

In general, the environmental matrix tends to be a good soiirce- of trace . 
elemerits, but in certain environments, the elements may be limiting. Addition of 
25 trace elements may enhance growth of some members of the community while 
inhibiting others. 
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Large surface area materials, such as glass beads or silica aerogels can be utilized as 
surfaces in the present: example. This allows a high concentration of microbes to be 
collected in a relatively small device holding multiple collections of substrate-surface 
conjugates. ■ " ~" ~" ^~ ~ ^" 

5 Glass beads can be derivitized with N-Acetyl B- D -glucosamine-phenylisothiocyanate 
as follows: ' 
Bead Preparation: 

30ml glass beads .(Biospec. Products, Bartlesville, OK) are mixed with 50ml 

APS/Toluene (10% APS) (Sigma Chemical Co,) 
10 Reflux overnight 

Decant and wash 3 times with Toluene 

Wash 3 times with ethanol and dr>^ in oven 
Derivitize with N-Acet\^l B- D -glucosamine-phenviisothiocvanate as follows: 
Combine in Falcon Tube: 
15 25 ml prepared glass beads from above, 

15 ml O.IM NaHC03 + 25mg N-Aceryl-B-D-glucosamine-PITC (Sigma 

Chemical Co.) + 1 ml DMSO . 

Add lOml NaHC03 + 1 ml DMSO 

Pour over glass beads 
20 Let shake in Falcon Tube ovemight 

Wash wkh 20ml 0. 1 M NaHCOj ^ 

Wash with SOmlddHp 

Dry at 55 °C for 1 hour 

Beads can then be placed in mesh filter "bags" (Spectrum, Houston, Texas) created to 
25 allow containment of the beads, while simultaneously allowing migration of microbes, 
which are then placed in any device used as a solid support which allows containment 
of the bag. Particularly preferred devices are made of inert materials, such as plexiglass. 
Alternatively, beads can be placed directly into Falcon Tubes (VWR, Fisher Scientific) 
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which have been punctured with holes using a needle. These ''containment" ,de vices are 
then deployed in desired biotopes for a period of time to allow^attractiqn and growth , of 
desirable microbes. ' ' 

The following protocol details one method for generating a simple *'bug trap": 
5 Puncture holes using a heated needle or other pointed device irito a 15ml Falcon" Tube 
(VWR, Fisher Scientific). 

Place approximately l-5mls of the derivitized beads into a Spectra/mesh nylon filter, 
sucK as those available from Spectrum (Houston, Texas) vyith a mesh opening of 70 m, 
an open area of 43%, and a thickness of 70 m. Seal the nylon fdter to create a "bag" 
10 containing the beads using, for instance, "Goop, Houshold Adhesive & Sealant. 
• Place the filter containing the beads into the ventilated Falcon Tube and deploy the mbe 
into the desired biotope for a period of time (typically days). 

Example 2 

DNA Isolation and Library Construction from Cultivated Organism 

is The' following outlines the procedures used to generate a gene library from an 

\ isolate, Streptomyces rimosusr ' 

Isolate DNA. 

1. Inoculate 25ml Trypticase Soy Broth (BBL Microbiolog}' Systeins) in 250 ml 
baffled erlenmeyer flasks with spores of Streptomyces rimosus. Incubate at,30°C at- 

20 250rpm for 48 hours. 

2. Collect mycelin by centrifligation. Ose 50ml conical tubes and centrifuge at 
25 °C at 4000rpm for 10 minutes. 

3 . Decant supematent and wash pellet 2X with 1 0 ml 1 0.3% sucrose (centrifuge as 
above between washes). 

25 4. Store pellet at -20X for future use. 
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5, Resuspend pellet in 40inl TE (lOmM Tris, ImM EDTA; pH 7.5) containing 
lysozyme (Img/ml; Sigma Chemical Co.)and incubate at 37°C for 45 minutes. 

6. Add sarcosyl (N-lauroylsarcosine, sodium salt, Sigma CKemical Co.) to final 
concentMtiorTof r%~andinvert"gently to-niix-fo — — ^ 

5 7. Transfer 20ml of preparation to clean tiihe and add proteinase K;(Stratagene 
Cloning Systems) to a final concentration of Img/mL Incubate overnight at 50°C. ^ 



8. 


Extract 2X with Phenol (saturated with TE). 


9. 


Extract IX with PhenolrCHjCl. 


10. 


Extract IX With CH3CI: Isoamyl alcohol. 


11. 


Precipitate DNA with 2 volumes of EtOH. 


12. 


Spool DNA on sealed pasteur pipet. ' ' . . 


13. 


. Rinse with 70% EtOH, ' ' 


14. 


Dr>' in air. . . • .■ • . ' 


15. 


Resuspend DNA in 1 ml TE and store at 4 "C to rehydrate slowly-. . 


16. 


Check qualit>' of DNA: 


A. 


Digest iO 1 DNA with EcoRI restriction enzyTne (Stratagene Clonmg Systems) 


according to manufacturers protocol electrophorese DNA digest through 0.5% agarose. 



20y overnight; stain gel in 1 g/ml EtBr ■ 
1. Determirie DNA concentration (A260-A280). 

20 Restriction Digest DNA 

1. tacubate the following at 37°C for 3 hours: 

8 1 /DN A (-10 g) 

35 1 H2O 

25 5 1 lOx restriction enzyme buffer 

2 1 EcoRI restriction enzyme (200units) 



2. 



Examine on agarose minigel. 
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Sucrose Gradient 

- .1. Prepare small sucrose gradient (Sambrook; fritsch and Maniatus, 1 989) and 
mn DNA at45;ObO rpm for 4 hours at 
2. Examine 5 1 of each fraction on 0.8% agarose gel. , 
5 3. Pool relevant fractions and precipitate DNA with 2.5 volumes of EtOH for 1 

hourat-70°C.' ' 
4. Collect DNA by centrifugation at 13,200 rpm for 15 minuter . 
. 5. ' Decant and wash with 1ml of 70% EtQH. i \ . 

6. br)', resuspend in 15 1 5T lE.^ : ■ 

10 7. Storeat4X. 'V ' ' . ' ; 

Dephosphor>'late DNA 

I. . Dephosphor>'iaie ' DNA with shrimp alkaline phosphatase . according lo 
manufacturers protocol (US Biochefnicals); 

Adaptor Ligation 

15 L ■ Ligate adaptors according to manufacturers protocol. ' ' 

Briefly, gently resuspend DNA in EcoR I-BamH l adaptors (Stratagene Cloning 
Systems); add 1 OX ligation buffer, 1 OmM rATP, and T4 DNA ligase and incubate at . 
Toom temperature for 4-6 hours. - . ; 

20 Preparation of Fosmid Arms 

4. Fosmid arms can be prepared as described (Kim, et.al., NucL Acids Res., 
■; 20: 10832-10835, 1992). Plasmid DNA can be digested with Pmel restriction enzyme 
(New England Biolabs) according to the manufacturers protocol, dephosphorylated 
(Sambrook, Fritsch and Maniatus, 1989), followed by a digestion with BaniH I restriction 
25 enzyme (New England Biolabs) according to the manufacturers protocol, and another 
dephosphorylation step to generate two arms each of which contain a cos site in the 
proper orientation for the cloning and packaging of ligated DNA between 35-45 kbp. 
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Ligation to Fosinid Arms 

1. . Prepare the ligation reaction: 

A. Add -SOng each of insert and vector DN A to 1 U of T4 DNA ligase (Boehringer 
MaHnheim) andl OX ' ~ 

5 ligase buffer as per manufacturers instructions; add H20 if necessary, to total 101, • 
1. ' Incubate overnight at 16°C. 

Package and Plate 

I. Package the ligation reactions using Gigapack XL packaging system (Stratagene 
Cloning Systems, Inc.) following manufacturer's protocol. 
10 11. Transfect E.coli strain DHIOB (Bethesda Research Laboratories, Inc.) according 
to manufacturers protocol and spread onto LB/Chloramphenicol plates (Sambrook, 
FritschandManiatusJ989). % 

Example 3 

15 Preparation of an Uncultivated Prokarvotic DNA Library 

Figure 1 shows an overview of the procedures used to construct an 
environmental library from a mixed picoplankton sample. The goal was to construct 
a stable, large insert DNA library representing picoplankton genomic DNA. 

Cell collection and preparation of DNA. Agarose plugs containing 
20 concentrated picoplankton cells were prepared from samples collected on an 
oceanographic cruise from Newport, Oregon to Honolulu, Hawaii. Seawater (30 
• - liters) was collected in Niskin bottles, screened through 10 m Nitex, and concentrated , 
• by hollow fiber filtration (AmiconDClO) through-30,000 MW cutoff polysulfone 
filters. The concentrated bacterioplankton cells were collected on a 0.22 m, 47 mm 
25 Durapore filter, and resuspended in 1 ml of 2X STE buffer (1 M NaCl, 0.1 M EDTA, 
10 mM Tris, pH 8.0) to a final density of approximately 1 X 10^^ cells per ml. The 
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cell suspension was mixed with one volume of 1 % molten Seaplaque LMP agarose 
(FMC) cooled to 40 °C, and then immediately drawn into a 1 ml s>Tinge. The syringe 
. ■ was sealed with parafilm and placed on ice for 10 rain. The cell-containing agarose 

plug was extruded into 10 ml of Lysis Buffer (lOmM Tris pH 8.0, 50 mM NaCl, O.IM 
5 EDTA, 1 % Sarkosyl, 0.2% sodium deoxychoiate, a mg/ml lysozyme) and incubated 
at 37 °C for one hour. The agarose plug was then transferred to 40 mis of ESP Buffer 
(1% Sarcosyl, I mg/ml proteinase-K, in 0.5M EDTA), and incubated at 55°C for 16 
hours. The solution was decanted and replaced with fresh ESP Buffer, and incubated • 
at 55 °C for an additional hour. The agarose plugs were then placed in 50 mM EDTA 
10 and stored at 4''C shipboard for the duration ofthepceanographic cruise. 

One slice ofan agarose plug (72 1) prepared from a sample collected off the 
Oregon coast was dialyzed overnight at 4=C against 1 ml of buffer A (lOOmM NaCl, 
lOmM Bis Tris Propane-HCl, 100 .g-'ml acetj'iated BSA; pH 7.0;@:25X) in a.2 mL ■ 
microcentrifuge tube. The solution was replaced w ith 250 1 of fresh buffer A 
15 containing 10 mM MgCl, and 1 mM DTT and incubated on a rocking platform for 1 
hr at room temperature. The solution was then changed to 250 1 of the same buffer 
containing 4U of Sau3Al (NEB), equilibrated to 37°C in a water bath, and then 
incubated on a rocking platform in a 37°C incubator for 45 min. The plug was ■ 
transferred to a 1.5 ml microcentrifuge tube and incubated at 68°C forSOminto- . 
20 inactivate the protein, e.g. enzymei and to melt the agarose. The agarose was digested 
and the DNA dephosphorylased using Gelase and HK-phosphatase (Epicentre), 
respectively, according to the manufacturer's recommendations. Protein was removed 
by gentle phenol/chloroform extraction and the DNA was ethanol precipitated, . . 

pelleted, and then washed. with 70% ethanol. This partially digested DNA was 
25 resuspended in sterile H^O to a concentration of 2.5 ng/ 1 for ligation to the pFOSl 
vector. 

PGR amplification results from several of the agarose plugs (data not shown) 
indicated the presence of significant amounts of archaeal DNA. Quantitative 
hybridization experiments using rRNA extracted from one sample, collected at 200 m 
30 of depth off the Oregon Coast, indicated that planktonic archaea in (this assemblage 
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comprised approximately 4,7% of the total picoplankton biomass (this sample 
corresponds to "PACr'-200 m in Table 1 of DeLong al, high abundance of 
Archaea in Antarctic marine picoplankton, yVamr^, 577:695-698, 1994). Results from 

^arcHaeal-biasedTDNA~PCR~amplifi^ 

5 confirmed the presence of relatively large amounts of archaeal DNA in this sample. 
Agarose plugs prepared from this picoplankton sample were chosen for subsequent 
fosmid library preparation. Each 1 ml agarose plug from this site contained 
approximately 7.5 x 10^ cells, therefore approximately 5.4 x 10^ cells were present in 
the 72 1 slice used in the preparation of the partially digested DNA. 

10 Vector arms are prepared from pFOSl as described (Kim et ai, Stable 

propagation of cosmid sized human DNA inserts in an F factor based vector, NucL 
■ AcidsRes., 20;10832-10835, 1992). Briefly, the plasmid is completely digested with 
Astll, dephosphorylated with HK phosphatase, and then digested with BamHI to 
generate two arms, each of which contains a cos site in the proper orientation for 

15 ' cloning and packaging ligated DNA between 35-45 kbp. The partially digested 

picoplankton DNA is ligated overnight to the pFOS 1 arms in a 1 5 1 ligation reaction 
containing 25 ng each of vector and insert and lU of T4 DNA ligase (Boehringer- 
Mannheim). The ligated DNA in four microliters of this reaction is in vitro packaged 
using the Gigapack XL packaging system (Stratagene), the fosmid particles 

20 transfected to £. coli strain DHl OB (BRL), and the cells spread onto LB^^jj plates. 
The resultant fosmid clones are picked into 96-wel! microliter dishes containing 
LBc^i5 supplemented with 7% glycerol. Recombinant fosmids, each containing ca, 40 
kb of picoplankton DNA insert, have yielded a library of 3.552 fosrnid clones, 
containing approximately 1.4 x 10^ base pairs of cloned DNA. All of the clones 

25 examined contained inserts ranging from 38 to 42 kbp. This library is stored frozen at 
-80'°C for later analysis. 
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- Example 4 
Normalization of DNA from Envir nnmental Samples 

, Prior to library generation, purified DNA from an environmental sarapl^^^^^ 
norm^iized. DNA is first fractionated according to the foilov 

5 S^ple composed of genomic DNA is purified on a cesium-chloride gradient. 

•nie.cesiumChloride.(Rf= 1.3980) solutioii is filtered throu • . ; . 

mlisloadedintaa35nilOptiSealtube(Beckman). The DNA is added and ■ 
thoroughly mixed.' Ten microgranis of bis-benzimide (Sigma; Hoechst 33258) is 
' addedand mixed thoroughly. The tube is then filled with the filtered. cesium ch^ 

10 solution and spun in a VTi50 rotor in a Beckman L8-70 Ultracentrifoge at 33,000 rpni , 
for 72.hours. Following centrifiigation, a s>Tinge pump and fractionator (Brandel- 

Model 1 86) are used to drive the gradient through an ISCO UA-5. UV absorbance , ' ■ . 

detectorset to 280 nm. Peaks representing.the.DNA from the organisms present in- an 

environmental sample are obtained. . ; • 

15 Nonnalizationis'then accomplished as follov/s:> .' • , . 

' L Dpuble-strarided DNA sample ikresuspended in hybridization buff^^^^^ 

■ -NaH2P0,,pH: 6.8/0.82 MNaC171mMEDTAy0.1% S / 

■ II. Sample is overlaid with mineral oil anddenatured by boiling for 10 minutes. 

ni. Sample is incubated at 68.° C for 12-3 6 hours. 

20 IV. Double-straiided DNA is separated from single-sfranded DNA acco 

. .. , standard protocols (SambroGW'19^^^ 

v. 'The single-sttahded DNA fraction is desalted and amplified by PGR. 
The process is repeated for several more rounds (up to 5 or more).. 
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Example 5 

Hybridization Screening of Libraries Generated in Prokarvotes and Expression 

Hybridization screening may be performed on fosmid clones, from a library 
5 generated according to the protocol described in* Example 3 above in any fosmid vector. , ^ h,. 

For instance, the pMF3 vector is a fosmid based vector which can be used for efficient . \ ' v 

yet stable cloning in E.coli and which can be integrated and maintained stably in ^ - r 

Streptomyces coelicolor or Streptomyces lividans. A pM F3 Iibrar\' generated according • ? . r> 

to the above protocol is first transformed into E.coli DHIOB cells. Chloran-.phenicol . • 

10 resistant transformants containing (cm or oxy^ are ideniified by screening the librciry by^ . 

colony hybridization using sequences designed from previously published sequer)x:es of p . ..- • 

oxy and tern genes. }(27, }28) Colony hybridization screening is described in derail in ^} , . , 

' "Molecular Cloning", A Laborator>' Manual, Srimbrook, et al., (1989) 1.90-i.l04. ^; - 
Colonies that test positive by hybridization can be purified and their fosmid clones , 
15 analyzed by restriction digestion and PGR to confirm that they contain the complete 

biosynthetic pathway. (See Figure 6). ^, , ^ .-i 

■ , • ' ■ ■■ . ■ ' 

Alternatively, DNA from the abovementioned fosmid clones may be used in a ^ 
amplification reaction designed to identify clones positive for an entire pathway. For 
example, the following sequences may be employed in an amplification reacUon to 
20 amplify a pathway encoding the antibiotic gramicidin (grainicidinoperon^ ' 
on a 34kbp DNA fragment potentially encoded on one fosmid clone: 



Primers: 

5'CACACGGATCCGAGCTCATCGATAGGCATGTGTTTAACTTCTTGTCATC3' 
S'CTTATTGGATCCGAGCTCAATTGCTGAAGAGTTGAAGGAGAGCATCTTCCS' 
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^ Amplification reaction: ' . ^ 

■ fosmid/insert DNA . * , , . ^ ^ v 

5i each primer (5png/ 1) ; . * ■ ' - ' ^ ' 

11 Boehringer Manxiheim EXPAND Polymerase from their 

.5- ■ , 1 1 .dNTP's ■ ^ ■ ' / ,■ : :' \ 

> .51 lOX Buffer #3 from' BoehringerMamiheim EXPAND kit ■ , ' ; . 

' ' ■ . 30;i ddHjO' . ■ ■ . , ; ' ' ■ ■ \ . - V. 

PCR Reaction Program: , , ' ^ ■ 

94°C 60 seconds V ' . ' " * ■ ^ 

10 20cyclesof: \ , . ^: ' ■ 

\. ; 94° e 10 seconds , ' ■ ... ; , - ■ / .V.' , , \ 

■■65.''C 30 seconds . ■.' : . ' ' . ' , . . : ■ ' 

68^G ^ ISminutes: . ; :^ ' ^ ^ ' ' 

, :one.cycleof - ■ ' , , . ■ , , 

15 ' we ■ 7minutes\ ■ . ■ . \ \ ; ^ : • ' \ ' . ' ■ / 

■ ^ . Storeat4X.: ■ „ ^ . ' / ■ ' , ' 

■ Fosmid DNA from clones that are shown to contain the oxytetracycline or . - ; \ 

' tetracenomycin polyketide encoding DNA' sequences, are then used to transform 5, 
/ /zvi£^rt5 TK24. Dacr protoplasts, fr^ ; 
20 overlaying .Regeneration plates with hygromycin /, 
screened for biqactivity by overlaying transformation plates with 2ml of nutrient soft 
agar containing cells of the test organisms Escherichia coli or Bacillus subtilis : E, coli 
is resistant to the thiostrepton concentration (50 mg/ml) to be used in the overlays of . 
pMF3 clones but is sensitive to oxytetracylin at a concentration of 5 mg/ml (29). The B. 
25 subtilis test.strain is rendered resistant to thiostrepton prior to screening by transforming 
with a thiostrepton marker carried on pHT315 (30). Bioactivity is demonstrated by 
- inhibition of growth of the particular test strain around the 5. lividans colonies. To 
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confirm bioactivity, presumptive active clones are isolated and cultures extracted using 
a moderately polar solvent, methanol. Extractions are prepared by addition of methanol 
in a 1 : 1 ratio with the clone fermentation broth followed by overnight shaking at 4^C. 
Gell debris^ and me'diaT^solias^iir'tKe aqueous phase^are-then-^be'^separated-by 
5 centrifugation, Recombinantly expressed compounds are recovered in the solvent phase 
and may be concentrated or diluted as necessary. Extracts of the clones are aliquoted 
onto 0.25-inch filter disks, the solvent allowed to evaporate, and then placed on the 
surface of an overlay containing the assay organisms. Following incubation at 
appropriate temperatures, the diameter of the clearing zones is measured and recorded. 
10 Diode array HPLC, using authentic oxytetracyclin and tetracenomycin as standards, can 
be used to confirm expression of these antibiotics from the recombinant clones. 

Rescue of chromosmallv integrated pathways 

Sequence analysis of chromosomally integrated pathways identified by screening can be 
performed for confirmation of the bioactive molecule. One approach which can be taken 

15 to rescue fosmid DNA from S. lividans .clones exhibitiiig bioactivity against the test 
organisms is based on the observation that plasmid vectors containing IS7/7, such as 
pMF3/ are present as.circular intermediates at a frequency of 1 per 10-30 chromosomes 
(31). The presumptive positive clones can be grown in 25 ml broth cultures and plasmid 
' DNA isolated by standard alkaline lysis procedures. Plasmid DNA preps are then used 

20 to transform £. coli and transformants are selected for Cm' by plating onto LB containing 
chloramphenicol (15 mg/ml). Fosmid DNA from the £. co// Cm' transformants is 
isolated and analyzed by restricfion digestion analysis, PCR, and DNA sequencing. 
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Example 6 

Screening Libraries of Genes for Compounds Affecting the Interac tion of Other 

Molecules in Prokarvotes 

. Large insert libraries generated according to Examples 2 and 3 can fae, screened 
5 for compounds which affect the interaction of other molecules using the , following 
method(s): 

Genes encoding two interacting proteins are fused to a wild type and a niutant 
LexA DNA binding domain (the mutant is a truncated LexA protein devoid of its own 
oligomerization domain and is termed LexA408). LexA is an efficient transcriptional ' 
10' repressor in E.coli only if it acts as a dimer. This propertv' is used to exchange the LexA 
dimerization domain by heterGiogous interacting motifs to recover repression. The non- 
coyalent interaction betvveen the hybrid proteins is probed by their capacity to restore the . 

repressor activity' of truncated Lex.^ proteins (LexA408). _ 

The iriteraction or association of the fused proteins is specifically measured usi^^ - 

1 5 a reporter gene controlled by a hybrid sulA operator containing a wild t>pe half-site and 
a mutated half-site (op408/op-f) in a reporter strain (SU202). The lacZ reporter gene is 
under control of the op408/op4- hybrid operator using the sulA promoter, the most tightly 
repressed naturally occurring SOS promoter. , 

■ SU202 cells containing the interacting proteins from above are co-transformed 

20 with a library of genes expressing small molecules, such as those generated in the 
Examples above. Cells are then screened for GFP expression; an'indication of inhibition 
of the protein-protein interaction. ' ' ^ 
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Example 7 

Screening Libraries of Genes for Compounds Affecting the Interaction of Other 

Molecules in Eukarvotes 

Commercially available two-hybn3~sySems can be purcliasea~fronrClontech 
5 Laboratories (Palo Alto, California) or Stratagene Cloning Systems (La JoUa, California). 
Genes encoding interacting molecules are cloned into vectors provided, and 
cotransformed into appropriate yeast strains provided. Interaction can be confirmed 
utilizing methods provided. Cells containing the interacting proteins are then transformed 
utilizing methods well known in the art with a library of genes expressing small 
10 molecules, such as those generated in the Examples above. Cells are then assayed for 
; an increase or decrease of the expression levels of the detectable molecule ( b- 
galactosidase) an indication of an effect on the protein-protein interaction. 
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Claims 

What is claimed is: ' 

T ATmetHoa^o^f evaluating a ^c^^^ 

a first test protein linked to a *DNA binding moiety and a second test protein 
linked to a transcriptional activation moiet}', comprising contacting said 
compound with said first test protein linked to a DNA binding moiety or second 
•test protein linked to a transcriptional activation moiety and determining the 
ability of said compound to regulate the interaction of said first test protein linked 
to a DNA binding moiet\' with said second test protein covalently linked to a 
transcriptional activation moiet\*, wherein said regulation enliances or inhibits the 
expression of a detectable protein. 

2. The method of claim 1 , wherein the DNA binding moiet\' and the transcriptional 
activation moiety are derived from a single.transcriptional. activator. 

3 . The method of claim 1 , wherein the DNA binding moiety and the transcriptional 
activation moiet>' are derived ft-om different proteins. 

4. The method of claim 1 , wherein said detectable protein is selected from the group 
■ consisting of beta-galactosidase, green fluorescent protein, luciferase, alkaline 

phosphatase and chloramphenical acetyl transferase 

5 . The method of claim 1 , wherein the compound is a protein. 

6. The method of claim 5, wherein the protein is encoded by a polynucleotide. 

7. The method of clakn 6, wherein the polynucleotide is contained in an expression 
vector in operable linkage. 
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8.^ The method of elaim 1 , wherein the compound is a bioactive molecule. , 

9\ The method of claim'8, wherein the bioactive molecule is a polyketide., ' 

10. the rriethod of clairn 9, wherein the polyketide is a product, of an enzymatic 
process encoded by an operon, or portions thereof . . 

11 : The method of claim' 10, wherein the pperon, "or portions thereof, is contained in 
/ an expression vector in operable linkage, . • 

12. , The rriethod of claim 10, w^herein^the operon, or portions thereof; is derived 'from 
\ . uncultivated microorganisms. ' ,. ■ • , . y 

13. - ' The ,method:of claim 12; wh;erein, the uncultivatedlmicroorganisms.comprise a: • . 

■ mixture\of ten-estrial,: nucroorganisms;,a mixture of marine microorganisms, or ■ ' 
a mixlaire of terrestrial microorganisms and marine microorganisni^^^ 

,14. .The- method of claini: .12, wherein. Jhe- /uhcultiyated .. microorganisms are " 
extremophiles. ' \ 

15. The method of claim 1 4. wherein the extremophiles are selected from the group 
consisting of thermophiles, hyperthermophiles, psychrpphiles, and psychrotrophs. 
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Fieure 5 



Capturing Large Genome Fragments From the Environment 

^I^Goncentraterbacteria, digest-protein and.preserve.high-MW_(>JOO,k£}p)_DNA.- 



Agarose "noodle 
protein extracUon 



11 



2. Partially digest DNA. Sepane by PFGE. 
Size select for cloning. 



3. Ugate to fosmid armsX package and transfect 
to £ coll Array library in microliter plates. 



INTERNATIONAL SEARCH REPORT 


International application Ko. 




PCT/US98/21895 ' 


A. CLASSIFICATION OF SUBJECT MATTER 




IPC(6) :CI2Q 1/68; COIN 33/53 




US CL :435/6, 7.1 




According to International Patent Classification (IPC) or to both national classification and IPC 


B. FIELDS SEARCHED 



Minimum documentation searched (classification system followed by classification s ymbols) 



U.S. : 435/6, 7.1 



Documentation searched other than minimum documentation to the extent that such documents arc included in the fields searched 



Electronic data base consulted during the international search (name of data base and. where practicable,, search terms used) 
Please See Extra Sheet. 



C DOCUMENTS CONSIDERED TO BE RELEVANT 

Category* Citation of document, with indication, where appropriate, of the relevant passages Relevant to claim No. 

X US5,525,490 A (ERICKSONetal.) I IJune 1996, eniire document, 1-8 

especially columns 15-16. 

X MENDELSOHN ec al. Applications of interaction craps/two-hybrid 1-4 

systems to biotechnology research. Curr. Opin. Biotechnol. October 
1994, Vol. 5, pages 482-486, especially page 485. 

X CHIU et al. RAPTl, a mammalian homolog of yeast Tor, interacts 1, 3, 4, 8, 9 

with the FKBP12/rapamycin complex. Proc. Natl. Acad. Sci. USA. 
20 December 1994, Vol. 91, No. 26, pages 12574-12578, see entire 
document. 



X Further documents are listed in the continuation of Box C. | [ See patent family annex. 



Speoial catagonea of cit«d documents: 

A* documflnt defining the g«neril lULo of the ux. which ii not considered 

to b« of ptrticulftT relevance 

H* fi&rtier document published on or after the international filing dale 

%' document which may throw doubts on priority c!aim(i) or which it 

cited to establish the publication date of another citation or other 
special reason (as ipecified) 

'O* • document referring to an oral disclosure, use, exhibition or other 
means 

•P* document published prior to the international filing date but later than 
the priority date claimed . 



later dfjctiment published after the intemaUonal filing date or priority 
dale and not tn conflict with the applic;ition but cited to understand 
the principle or theory underlying the invention 

document of particular relevance; the claimed mvention cannot be 
considered novel or cannot be considered to uivolve an inventive step 
when the document is taken a/one 

document of particular relevance; the claimed inventior] cannot be 
corutdered to involve an inventive step when the document ii 
combined with one or more other such documenu, such combination 
being obvious to a person skilled in Uie art 

document member of the same patent family 



Date of the actual completion of the international search 
14 DECEMBER 1998 



Name and-mailing addrcss'of-the-ISAAJS- 
Conwnissioner of Patents and Tradcmaiits 
Box PCT 

Washington, D.C 20231 
Facsimile No. (703) 305-3230 



Date of mailing of the international search report 




-Authorized-ofi>|Cer 

ROBEiyr^tHWA 
Telephone No. (703) 308-0 1 96 



ZMAN 



Form PCT/ISA/210 (second shcetXiiiW 1992)* 



i 




i 



